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(54) Title: NOVEL HUMAN GENES AND GENE EXPRESSION PRODUCTS 

Q (57) Abstract: The invention provides novel polynucleotides. The invention further provides novel members of protein families, 
and polynucleotides that are differentially expressed in cancer cells relative to normal cells, and in metastatic cancer cells relative to 
normal cells or non-metastatic cancer cells. 
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NOVEL HUMAN GENES AND GENE EXPRESSION PRODUCTS 

FIELD OF THE INVENTION 

The present invention relates to novel polynucleotides of human origin 
and the encoded gene products. 

5 BACKGROUND OF THE INVENTION 

Identification of novel polynucleotides, particularly those that encode an 
expressed gene product, is important in the advancement of drug discovery, diagnostic 
technologies, and the understanding of the progression and nature of complex diseases 
such as cancer. Identification of genes expressed in different cell types isolated from 
10 sources that differ in disease state or stage, developmental stage, exposure to various 
environmental factors, the tissue of origin, the species from which the tissue was 
isolated, and the like is key to identifying the genetic factors that are responsible for the 
phenotypes associated with these various differences. 

This invention provides novel human polynucleotides, the polypeptides 
1 5 encoded by these polynucleotides, and the genes and proteins corresponding to these 
novel polynucleotides. 

SUMMARY OF THE INVENTION 

This invention relates to novel human polynucleotides and variants 

t he r eof iUo\r ormnHnH pnfyppptirlpg and variants ther e of to genes corresponding to these 

20 polynucleotides and to proteins expressed by the genes. The invention also relates to 
diagnostics and therapeutics comprising such novel human polynucleotides, their 
corresponding genes or gene products, including probes, antisense nucleotides, and 
antibodies. The polynucleotides of the invention correspond to a polynucleotide 
comprising the sequence information of at least one of SEQ ID NOs: 1-3351. 

25 Various aspects and embodiments of the invention will be readily 

apparent to the ordinarily skilled artisan upon reading the description provided herein. 

DETAILED DESCRIPTION OF THE INVENTION 

The invention relates to polynucleotides comprising the disclosed 
nucleotide sequences, to full length cDNA, mRNA genomic sequences, and genes 
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corresponding to these sequences and degenerate variants thereof, and to polypeptides 
encoded by the polynucleotides of the invention and polypeptide variants. 

Polypeptide variants differ from wild type protein in having one or more 
amino acid substitutions that either enhance, add, or diminish a biological activity of the 
5 wild type protein. 

Six of the polypeptides disclosed herein encode new members of the MKK 
kinase family; the coding region is found within the nucleotide region in parentheses: SEQ 
ID NO:29 (nucleotides 295-421); SEQ ID NO:31 (298-397); SEQ ID NO: 196 (37-322); 
SEQ ID NO:3175 (nucleotides 14-164); SEQ ID NO:3190 (229-390); and SEQ ID 
10 NO:3281 (15-182). Twenty-four of the polypeptides encode new members of the family 
of transcription factor proteins having a basic region plus leucine zipper: SEQ ID NO:4IO 
(42-191); SEQ ID NO:552 (1 16-288); SEQ ID NO:768 (1 16-288); SEQ ID NO:822 (108- 
262); SEQ ID NO:836 (158-353); SEQ ID NO: 1288 (73-234); SEQ IDNO:1365 (69-257); 
SEQ ID NO:1540 (289-471); SEQ ID NO:1549 (200-391); SEQ ID N0.1556 (163-354); 
15 SEQ ID NO:1557 (207-398); SEQ ID NO:1563 (107-298); SEQ ID N0:1622 (180-365); 
SEQ ID NO:1630 (100-291); SEQ ID NO:1704 (184-372); SEQ ID NO:1808 (36-161); 
SEQ ID NO:1454 (49-209); SEQ ID NO:2363 (48-211); SEQ ID NO:2424 (43-194); 
1 SEQ ID NO:3147 (190-369); SEQ ID NO:3152 (129-320); SEQ ID NO:3158 (167- 

334); and SEQ ID NO:3208 (34-256). 
20 SEQ ID NOs:l86 (175-395); 2591 (60-165); 3307 (43-321); and 3339 

(94-342) encode polypeptides having an SH2 domain, and SEQ ID NOs:234 (23-121), 
1832 (18-173), and 1835 (57-206) encode polypeptides having an SH3 domain. Nine 
polypeptides encode new members of the family of proteins having Ank repeat regions: 
SEQ ID NO:187 (358-432); SEQ ID NO:I268 (238-315); SEQ ID NO:1804 (301-378); 
! 25 SEQ ID NO: 181 9 (278-355); SEQ ID NO: 1839 (224-307); SEQ ID NO: 1830 (184-267); 

! SEQ ID NO:2562 (18-101); SEQ ID NO:3015 (131-214); and SEQ ID NO:3267 (97- 

180). 

The following eleven polynucleotides encode polypeptides having a C2H2 
type zinc finger: SEQ ID NOs:308 (1 10-172); 807 (339-392); 1324 (294-356); 1503 (154- 
30 216); 1527 (156-212); 1674 (196-258); 1779 (64-126); 1801 (295-351); 3081 (190-252); 
3193 (293-355); and 3306 (161-223). Eight polynucleotides encode polypeptides of the 
family of ATPases: SEQ ID NOs:43l (71-428); 639 (157-561); 2135 (2-401); 2684 (9- 
461); 2859 (100-320); 3178 (45-386); 3197 (281-343) and 3266 (8-139). Polypeptides 
j having a fibronectin type III domain are encoded by SEQ ID NO:746 (209-427) and 1 192 

! 35 (186-416). Polypeptides having an EF-hand domain are encoded by SEQ ID NO:820 (341- 
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406); 1755 (281-367) and 3285(16-102). Six polypeptides of the protein kinase family are 
encoded by SEQ ID NOs:l 157 (41-444); 147.8 (54-437), 1496 (241-520); 2286 (12-182); 
2969 (5-387); and 3190 (1 18-390). 

LIM domain-containing polypeptides are encoded by SEQ ID NO: 1269 
5 (79-240); 1309 (248-404); 1360 (222-377); and 1386 (243-398). Two polypeptides of the 
family having a C2 domain (protein kinase C-like) are encoded by SEQ ID NO: 1 325 (1- 
234) and 2282(183-353). Polypeptides having a WD domain, G-beta repeat motif are 
encoded by SEQ ID NOs:l336 (66-164); 1380 (42-140); 1711 (263-361); 1762 (236-334); 
1909 (160-258); 2218 (127-225); 3047 (191-292); 3108 (275-367) and 3292 (208-300). 

10 SEQ ID NO:1410 (222-350) encodes a member of the trypsin family. SEQ 

ID NOs:1417 (8-354); 2281 (20-387) and 2310 (20-371) encode members of the protein 
tyrosine phosphatase family. SEQ ID NOs:1464 (4-180) and 1514 (2-252) encode 
members of the family having an RNA recognition motif (also known as RRM, RBD, or 
RNP domain). SEQ ID NOs:1496 (241-520) and 3297(7-153) encode helicases having a 

15 conserved C-terminal domain. SEQ ID NO: 1 538 (9-635) encodes a member of the wnt 
family of developmental signaling proteins. 

Three polynucleotides encode polypeptides having a homeobox domain: 
SEQ ID NOs:1676 (9-86); 1820 (123-299); and 1821 (127-303). A novel thioredoxin is 
encoded by SEQ ID NO: 1677 (316-369). Two novel members of the ras family are 

20 encoded by SEQ ID NO: 1 688(1 09-4 10) and 3258(138-394). A novel polypeptide having a 
phosphatidylinositol-specific phospholipase C Y-domain is encoded by SEQ ID NO: 1707 
(92-439). A novel serine carboxypeptidase is encoded by SEQ ID NO: 1744 (238-433). A 
novel polypeptide having N-termi nal homology in the Els doma i n is unuded by SEQ ID 
NO:1 81 1 (184-315). A novel polypeptide having a bromodomain is encoded by SEQ ID 

25 NO: 1814 (127-294). A novel polypeptide having a double-stranded RNA bindi ng motif is 
encoded by SEQ ID NO: 181 8 (9-146). A novel polypeptide having a G-protein alpha 
subunit is encoded by SEQ ID NO: 1846 (12-398). 

SEQ ID NOs:l911 (35-151) and 1980 (60-197) encode polypeptides 
having a C3HC4 type zinc finger domain (RING finger). SEQ ID NO:2065 (253-306) 

30 encodes a polypeptide having a CCHC zinc finger domain. SEQ ID NO:22 1 6 (90- 1 79) 
encodes a polypeptide having a WW/rsp5/WWP domain. SEQ ID NO:2428 (25-350) 
encodes a polypeptide member of the dual specificity phosphatase family, having a 
catalytic domain. 

SEQ ID NOs:2577 (0-311); 3183 (14-215); and 3195 (0-215) encode 
35 members of the 4 transmembrane segment integral membrane protein family. SEQ ID 
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NOs:2826 (1 16-400) and 2871 (198-392) encode polypeptides of the DEAD and DEAH 
box helicase family. SEQ ID NO:2944 (18-281) encodes a polypeptide having a 
calpain large subunit, domain III. 

SEQ ID NO:3274 (11-187) encodes a eukaryotic transcription factor 
5 with a fork head domain. SEQ ID NO:3345 (65-271) encodes a polypeptide having a 
PDZ domain, and SEQ ID NO:3351 (124-270) encodes a polypeptide in the family of 
phorbol esters/glycerol binding proteins. 

Described below are polynucleotide compositions encompassed by the 
invention, methods for obtaining cDNA or genomic DNA encoding a full-length gene 

10 product, expression of these polynucleotides and genes, identification of structural motifs 
of the polynucleotides and genes, identification of the function of a gene product encoded 
by a gene corresponding to a polynucleotide of the invention, use of the provided 
polynucleotides as probes and in mapping and in tissue profiling, use of the corresponding 
polypeptides and other gene products to raise antibodies, and use of the polynucleotides 

15 and their encoded gene products for therapeutic and diagnostic purposes. 

Polynucleotide Compositions 

The scope of the invention with respect to polynucleotide compositions 
includes, but is not necessarily limited to, polynucleotides having a sequence set forth in 
any one of SEQ ID NOs: 1-3351; polynucleotides obtained from the biological materials 

20 described herein or other biological sources (particularly human sources) by 
hybridization under stringent conditions (particularly conditions of high stringency); 
genes corresponding to the provided polynucleotides; variants of the provided 
polynucleotides and their corresponding genes, particularly those variants that retain a 
biological activity of the encoded gene product (e.g., a biological activity ascribed to a 

25 gene product corresponding to the provided polynucleotides as a result of the 
assignment of the gene product to a protein family(ies) and/or identification of a 
functional domain present in the gene product). Other nucleic acid compositions 
contemplated by and within the scope of the present invention will be readily apparent 
to one of ordinary skill in the art when provided with the disclosure here. 

30 "Polynucleotide" and "nucleic acid" as used herein with reference to nucleic acids of 
the composition is not intended to be limiting as to the length or structure of the nucleic 
acid unless specifically indicated. 

The invention features polynucleotides that are expressed in human 
tissue, specifically human colon, breast, and/or lung tissue. Novel nucleic acid 
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compositions of the invention comprise a sequence set forth in any one of SEQ ID 
NOs: 1-3 351 or an identifying sequence thereof. An "identifying sequence" is a 
contiguous sequence of residues at least about 10 nt to about 20 nt in length, usually at 
least about 50 nt to about lOOnt in length, that uniquely identifies a polynucleotide 
5 sequence, e.g., exhibits less than 90%, usually less than about 80% to about 85% 
sequence identity to any contiguous nucleotide sequence of more than about 20 nt. 
Thus, the subject novel nucleic acid compositions include full length cDNAs or mRNAs 
that encompass an identifying sequence of contiguous nucleotides from any one of SEQ 
ID NOs: 1-3351. 

10 The polynucleotides of the invention also include polynucleotides having 

sequence similarity or sequence identity. Nucleic acids having sequence similarity are 
detected by hybridization under low stringency conditions, for example, at 50°C and 
10XSSC (0.9 M saline/0.09 M sodium citrate) and remain bound when subjected to 
washing at 55°C in 1XSSC. Sequence identity can be determined by hybridization 

15 under stringent conditions, for example, at 50°C or higher and 0.1XSSC (9 mM 
saline/0.9 mM sodium citrate). Hybridization methods and conditions are well known 
in the art, see, e.g., U.S. Patent No. 5,707,829. Nucleic acids that are substantially 
identical to the provided polynucleotide sequences, e.g., allelic variants, genetically 
altered versions of the gene, etc., bind to the provided polynucleotide sequences (SEQ 

20 ID NOs: 1-3351) under stringent hybridization conditions. By using probes, particularly 
labeled probes of DNA sequences, one can isolate homologous or related genes. The 
source of homologous genes can be any species, e.g., primate species, particularly 

human; lodents, such as rats and mice; canines, felines, bovines, ovines, equines, yeast, 

nematodes, etc, 

25 Preferably, hybridization is performed using at least 15 contiguous 

nucleotides (nt) of at least one of SEQ ID NOs:l-3351. That is, when at least 15 
contiguous nt of one of the disclosed SEQ ID NOs. is used as a probe, the probe will 
preferentially hybridize with a nucleic acid comprising the complementary sequence, 
allowing the identification and retrieval of the nucleic acids that uniquely hybridize to 

30 the selected probe. Probes from more than one SEQ ID NO. can hybridize with the 
same nucleic acid if the cDNA from which they were derived corresponds to one 
mRNA. Probes of more than 15 nt can be used, e.g., probes of from about 18 nt to 
about 1 00 nt, but 1 5 nt represents sufficient sequence for unique identification. 

The polynucleotides of the invention also include naturally occurring 

35 variants of the nucleotide sequences {e.g., degenerate variants, allelic variants). 



5 



WO 01/02568 



PCT/US00/18374 



Variants of the polynucleotides of the invention are identified by hybridization of 
putative variants with nucleotide sequences disclosed herein, preferably by 
hybridization under stringent conditions. For example, by using appropriate wash 
conditions, variants of the polynucleotides of the invention can be identified where the 
5 allelic variant exhibits at most about 25-30% base pair (bp) mismatches relative to the 
selected polynucleotide probe. In general, allelic variants contain 15-25% bp 
mismatches, and can contain as little as even 5-15%, or 2-5%, or 1-2% bp mismatches, 
as well as a single bp mismatch. 

The invention also encompasses homologs corresponding to the 

10 polynucleotides of SEQ ID NOs:l-3351, where the source of homologous genes can be 
any mammalian species, e.g., primate species, particularly human; rodents, such as rats; 
canines, felines, bovines, ovines, equines, yeast, nematodes, etc. Between mammalian 
species, e.g., human and mouse, homologs generally have substantial sequence 
similarity, e.g., at least 75% sequence identity, usually at least 90%, more usually at 

15 least 95% between nucleotide sequences. Sequence similarity is calculated based on a 
reference sequence, which may be a subset of a larger sequence, such as a conserved 
motif, coding region, flanking region, etc. A reference sequence will usually be at least 
about 18 contiguous nt long, more usually at least about 30 nt long, and may extend to 
the complete sequence that is being compared. Algorithms for sequence analysis are 

20 known in the art, such as BLAST, described in Altschul et ah, J. Mol Biol (1990) 
275:403-10. 

In general, variants of the invention have a sequence identity greater than 
at least about 65%, preferably at least about 75%, more preferably at least about 85%, 
and can be greater than at least about 90%, 91%, 92%, 93%, 94%, 95%, or 96%, most 

25 preferably 97%, 98% or 99%, For the purposes of this invention, a preferred method of 
calculating percent identity is the Smith-Waterman algorithm, using the following. 
Global DNA sequence identity must be greater than 65% as determined by the Smith- 
Waterman homology search algorithm as implemented in MPSRCH program (Oxford 
Molecular) using an affme gap search with the following search parameters: gap open 

30 penalty, 12; and gap extension penalty, 1. 

The subject nucleic acids can be cDNAs or genomic DNAs, as well as 
fragments thereof, particularly fragments that encode a biologically active gene product 
and/or are useful in the methods disclosed herein {e.g., in diagnosis, as a unique 
identifier of a differentially expressed gene of interest, etc.). The term "cDNA" as used 

35 herein is intended to include all nucleic acids that share the arrangement of sequence 
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elements found in native mature mRNA species, where sequence elements are exons 
and 3 s and 5 s non-coding regions. Normally mRNA species have contiguous exons, 
with the intervening introns, when present, being removed by nuclear RNA splicing, to 
create a continuous open reading frame encoding a polypeptide of the invention. 
5 A genomic sequence of interest comprises the nucleic acid present 

between the initiation codon and the stop codon, as defined in the listed sequences, 
including all of the introns that are normally present in a native chromosome. It can 
further include the 3' and 5' untranslated regions found in the mature mRNA. It can 
further include specific transcriptional and translational regulatory sequences, such as 

10 promoters, enhancers, e/c, including about 1 kb, but possibly more, of flanking 
genomic DNA at either the 5' and 3' end of the transcribed region. The genomic DNA 
can be isolated as a fragment of 100 kbp or smaller; and substantially free of flanking 
chromosomal sequence. The genomic DNA flanking the coding region, either 3' and 
5', or internal regulatory sequences as sometimes found in introns, contains sequences 

1 5 required for proper tissue, stage-specific, or disease-state specific expression. 

The nucleic acid compositions of the subject invention can encode all or 
a part of the subject polypeptides. Double or single stranded fragments can be obtained 
from the DNA sequence by chemically synthesizing oligonucleotides in accordance 
with conventional methods, by restriction enzyme digestion, by PCR amplification, etc. 

20 Isolated polynucleotides and polynucleotide fragments of the invention comprise at 
least about 10, about 15, about 20, about 35, about 50, about 100, about 150 to about 
200, about 250 to about 300, or about 350 contiguous nt selected from the 
polynucleotide sequ ences as shuwn iu 3EQ ID NOs: 1-3351. — The fragments also 
include those of lengths intermediate to the specifically mentioned lengths, such as 35, 

25 36, 37, 38, 39, etc.; 150, 151, 152, 153, 154, etc. For the most part, fragments will be of 
at least 15 nt, usually at least 18 nt or 25 nt, and up to at least about 50 contiguous nt in 
length or more. In a preferred embodiment, the polynucleotide molecules comprise a 
contiguous sequence of at least 12 nt selected from the group consisting of the 
polynucleotides shown in SEQ ID NOs:l-3351. 

30 Probes specific to the polynucleotides of the invention can be generated 

using the polynucleotide sequences disclosed in SEQ ID NOs: 1-3351. The probes are 
preferably at least about a 12, 15, 16, 18, 20, 22, 24, or 25 nt fragment of a 
corresponding contiguous sequence of SEQ ID NOs: 1-3351, and can be less than v 2, 1, 
0.5, 0.1, or 0.05 kb in length. The probes can be synthesized chemically or can be 

35 generated from longer polynucleotides using restriction enzymes. The probes can be 

1 
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labeled, for example, with a radioactive, biotinylated, or fluorescent tag. Preferably, 
probes are designed based upon an identifying sequence of a polynucleotide of one of 
SEQ ID NOs: 1-3351. More preferably, probes are designed based on a contiguous 
sequence of one of the subject polynucleotides that remain unmasked following 
5 application of a masking program for masking low complexity (e.g., XBLAST) to the 
sequence., i.e., one would select an unmasked region, as indicated by the 
polynucleotides outside the poly-n stretches of the masked sequence produced by the 
masking program. 

The polynucleotides of the subject invention are isolated and obtained in 
10 substantial purity, generally as other than an intact chromosome. Usually, the 
polynucleotides, either as DNA or RNA, will be obtained substantially free of other 
naturally-occurring nucleic acid sequences, generally being at least about 50%, usually 
at least about 90% pure and are typically "recombinant", e.g., flanked by one or more 
nucleotides with which it is not normally associated on a naturally occurring 
15 chromosome. 

The polynucleotides of the invention can be provided as a linear 
molecule or within a circular molecule, and can be provided within autonomously 
replicating molecules (vectors) or within molecules without replication sequences. 
Expression of the polynucleotides can be regulated by their own or by other regulatory 

20 sequences known in the art. The polynucleotides of the invention can be introduced 
into suitable host cells using a variety of techniques available in the art, such as 
transferrin polycation-mediated DNA transfer, transfection with naked or encapsulated 
nucleic acids, Iiposome-mediated DNA transfer, intracellular transportation of DNA- 
coated latex beads, protoplast fusion, viral infection, electroporation, gene gun, calcium 

25 phosphate-mediated transfection, and the like. 

The subject nucleic acid compositions can be used to, for example, 
produce polypeptides, as probes for the, detection of mRNA of the invention in 
biological samples (e.g., extracts of human cells) to generate additional copies of the 
polynucleotides, to generate ribozymes or antisense oligonucleotides, and as single 

30 stranded DNA probes or as triple-strand forming oligonucleotides. The probes 
described herein can be used to, for example, determine the presence or absence of the 
polynucleotide sequences as shown in SEQ ID NOs: 1-3351 or variants thereof in a 
sample. These and other uses are described in more detail below. 
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j 

Use of Polynucleotides to Obtain Full-Length cDNA. Gene, and Promoter Region 

Full-length cDNA molecules comprising the disclosed polynucleotides 
are obtained as follows. A polynucleotide having a sequence of one of SEQ ID NOs:l- 
3351, or a portion thereof comprising at least 12, 15, 18, or 20 nt, is used as a 
5 hybridization probe to detect hybridizing members of a cDNA library using probe 
design methods, cloning methods, and clone selection techniques such as those 
described in U.S. Patent No. 5,654,173. Libraries of cDNA are made from selected 
tissues, such as normal or tumor tissue, or from tissues of a mammal treated with, for 
example, a pharmaceutical agent. Preferably, the tissue is the same as the tissue from 

10 which the polynucleotides of the invention were isolated, as both the polynucleotides 
described herein and the cDNA represent expressed genes. Most preferably, the cDNA 
library is made from the biological material described herein in the Examples. The 
choice of cell type for library construction can be made after the identity of the protein 
encoded by the gene corresponding to the polynucleotide of the invention is known. 

15 This will indicate which tissue and cell types are likely to express the related gene, and 
thus represent a suitable source for the mRNA for generating the cDNA. As described 
in the Examples, cDNA of the invention was isolated from specific cell or tissue types, 
and such cells and tissues are preferable for obtaining related nucleic acids. 

Techniques for producing and probing nucleic acid sequence libraries are 

20 described, for example, in Sambrook et al., Molecular Cloning: A Laboratory Manual, 
2nd Ed, (1 989) Cold Spring Harbor Press, Cold Spring Harbor, NY. The cDNA can be 
prepared by using primers based on sequence from SEQ ID NOs: 1-3351. In one 
embodime nt, the cDNA libi aiy can be made Ouni only puly-adenylated mRNA. Thus, 
poly-T primers can be used to prepare cDNA from the mRNA. 

25 Members of the library that are larger than the provided polynucleotides, 

and preferably that encompass the complete coding sequence of the native message, are 
obtained. In order to confirm that the entire cDNA has been obtained, RNA protection 
experiments are performed as follows. Hybridization of a full-length cDNA to an 
mRNA will protect the RNA from RNase degradation. If the cDNA is not full length, 

30 then the portions of the mRNA that are not hybridized will be subject to RNase 
degradation. This is assayed, as is known in the art, by changes in electrophoretic 
mobility on polyacrylamide gels, or by detection of released monoribonucleotides. 
Sambrook et al., Molecular Cloning: A Laboratory Manual, 2nd Ed, (1989) Cold 
Spring Harbor Press, Cold Spring Harbor, NY. In order to obtain additional sequences 

9 
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5' to the end of a partial cDNA, 5' RACE (PCR Protocols: A Guide to Methods and 
Applications, (1990) Academic Press, Inc.) can be performed. 

Genomic DNA is isolated using the provided polynucleotides in a 
manner similar to the isolation of full-length cDNAs. Briefly, the provided 
5 polynucleotides, or portions thereof, are used as probes to libraries of genomic DNA. 
Preferably, the library is obtained from the cell type that was used to generate the 
polynucleotides of the invention, but this is not essential. Most preferably, the genomic 
DNA is obtained from the biological material described herein in the Examples. Such 
libraries can be in vectors suitable for carrying large segments of a genome, such as PI 

10 or YAC, as described in detail in Sambrook et al., 9.4-9.30. In addition, genomic 
sequences can be isolated from human BAC libraries, which are commercially available 
from Research Genetics, Inc., Huntsville, Alabama, USA, for example. In order to 
obtain additional 5* or 3' sequences, chromosome walking is performed, as described in 
Sambrook et al., such that adjacent and overlapping fragments of genomic DNA are 

15 isolated. These are mapped and pieced together, as is known in the art, using restriction 
digestion enzymes and DNA ligase. 

Using the polynucleotide sequences of the invention, corresponding full- 
length genes can be isolated using both classical and PCR methods to construct and 
probe cDNA libraries. Using either method, Northern blots, preferably, are performed 

20 on a number of cell types to determine which cell lines express the gene of interest at 
the highest level. Classical methods of constructing cDNA libraries are taught in 
Sambrook et al., supra. With these methods, cDNA can be produced from mRNA and 
inserted into viral or expression vectors. Typically, libraries of mRNA comprising 
poly(A) tails can be produced with poIy(T) primers. Similarly, cDNA libraries can be 

25 produced using the instant sequences as primers. 

PCR methods are used to amplify the members of a cDNA library that 
comprise the desired insert. In this case, the desired insert will contain sequence from 
the full length cDNA that corresponds to the instant polynucleotides. Such PCR 
methods include gene trapping and RACE methods as described in Gruber et al., WO 

30 95/04745 and Gruber et al., U.S. Patent No. 5,500,356. Kits are commercially available 
to perform gene trapping experiments from, for example, Life Technologies, 
Gaithersburg, Maryland, USA. In preferred embodiments of RACE, a common primer 
is designed to anneal to an arbitrary adaptor sequence ligated to cDNA ends (Apte and 
Siebert, Biotechniques (1993) 75:890-893; Edwards et al., Nuc. Acids Res. (1991) 

35 72:5227-5232). When a single gene-specific RACE primer is paired with the common 
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primer, preferential amplification of sequences between the single gene specific primer 
and the common primer occurs. Commercial cDNA pools modified for use in RACE 
are available. 

The promoter region of a gene generally is located 5' to the initiation site 
5 for RNA polymerase II. Hundreds of promoter regions contain the "TATA" box, a 
sequence such as TATTA or TATAA, which is sensitive to mutations. The promoter 
region can be obtained by performing 5' RACE using a primer from the coding region 
of the gene. Alternatively, the cDNA can be used as a probe for the genomic sequence, 
and the region 5 5 to the coding region is identified by "walking up." If the gene is 
10 highly expressed or differentially expressed, the promoter from the gene can be of use 
in a regulatory construct for a heterologous gene. 

Once the full-length cDNA or gene is obtained, DNA encoding variants 
can be prepared by site-directed mutagenesis, described in detail in Sambrook et al., 
15.3-1 5.63. The choice of codon or nucleotide to be replaced can be based on disclosure 
15 herein on optional changes in amino acids to achieve altered protein structure and/or 
function. 

As an alternative method to obtaining DNA or RNA from a biological 
material, nucleic acid comprising nucleotides having the sequence of one or more 
polynucleotides of the invention can be synthesized. Thus, the invention encompasses 

20 nucleic acid molecules ranging in length from 15 nt (corresponding to at least 15 
contiguous nt of one of SEQ ID NOs: 1-3351) up to a maximum length suitable for one 
or more biological manipulations, including replication and expression, of the nucleic 
acid molecule. The invention includes but is not limited to (a) nucle ic acid having the 
size of a full gene, and comprising at least one of SEQ ID NOs:l-3351; (b) the nucleic 

25 acid of (a) also comprising at least one additional polynucleotide or gene, operably 
linked to permit expression of a fusion protein; (c) an expression vector comprising (a) 
or (b); (d) a plasmid comprising (a) or (b) ; and (e) a recombinant viral particle 
comprising (a) or (b). Once provided with the polynucleotides disclosed herein, 
construction or preparation of (a) - (e) are well within the skill in the art. 

30 The sequence of a nucleic acid comprising at least 15 contiguous nt of at 

least any one of SEQ ID NOs: 1-3 351, preferably the entire sequence of at least any one 
of SEQ ID NOs:l-3351, is not limited and can be any sequence of A, T, G, and/or C 
(for DNA) and A, U, G, and/or C (for RNA) or modified bases thereof, including 
inosine and pseudouridine. The choice of sequence will depend on the desired function 

35 and can be dictated by coding regions desired, the intron-like regions desired, and the 
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regulatory regions desired. Where the entire sequence of any one of SEQ ID NOs:l- 
3351 is within the nucleic acid, the nucleic acid obtained is referred to herein as a 
polynucleotide comprising the sequence of any one of SEQ ID NOs: 1-3351 . 

Expression of Polypeptide Encoded by Full-Length cDNA or Full-Length Gene 
5 The provided polynucleotides {e.g., a polynucleotide having a sequence 

of one of SEQ ID NOs: 1-3351), the corresponding cDNA, or the full-length gene is 
used to express a partial or complete gene product. Constructs of polynucleotides 
having sequences of SEQ ID NOs: 1-3351 can be generated synthetically. Alternatively, 
single-step assembly of a gene and entire plasmid from large numbers of 

10 oligodeoxyribonucleotides is described by, e.g., Stemmer et al., Gene {Amsterdam) 
(1995) 7<54(7):49-53. In this method, assembly PCR (the synthesis of long DNA 
sequences from large numbers of oligodeoxyribonucleotides (oligos)) is described. The 
method is derived from DNA shuffling (Stemmer, Nature (1994) 570:389-391), and 
does not rely on DNA ligase, but instead relies on DNA polymerase to build 

1 5 increasingly longer DNA fragments during the assembly process. 

Appropriate polynucleotide constructs are purified using standard 
recombinant DNA techniques as described in, for example, Sambrook et al., Molecular 
Cloning: A Laboratory Manual, 2nd Ed., (1989) Cold Spring Harbor Press, Cold Spring 
Harbor, NY, and under current regulations described in United States Dept. of HHS, 

20 National Institute of Health (NIH) Guidelines for Recombinant DNA Research. The 
gene product encoded by a polynucleotide of the invention is expressed in any 
expression system, including, for example, bacterial, yeast, insect, amphibian and 
mammalian systems. Vectors, host cells and methods for obtaining expression in same 
are well known in the art. Suitable vectors and host cells are described in U.S. Patent 

25 No. 5.654,173. 

Polynucleotide molecules comprising a polynucleotide sequence 
provided herein are generally propagated by placing the molecule in a vector. Viral and 
non-viral vectors are used, including plasmids. The choice of plasmid will depend on 
the type of cell in which propagation is desired and the purpose of propagation. Certain 
30 vectors are useful for amplifying and making large amounts of the desired DNA 
sequence. Other vectors are suitable for expression in cells in culture. Still other 
vectors are suitable for transfer and expression in cells in a whole animal or person. The 
choice of appropriate vector is well within the skill of the art. Many such vectors are 
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available commercially. Methods for preparation of vectors comprising a desired 
sequence are well known in the art. 

The polynucleotides set forth in SEQ ID NOs: 1-3351 or their 
corresponding full-length polynucleotides are linked to regulatory sequences as 
5 appropriate to obtain the desired expression properties. These can include promoters 
(attached either at the 5* end of the sense strand or at the 3' end of the antisense strand), 
enhancers, terminators, operators, repressors, and inducers. The promoters can be 
regulated or constitutive. In some situations it may be desirable to use conditionally 
active promoters, such as tissue-specific or developmental stage-specific promoters. 

10 These are linked to the desired nucleotide sequence using the techniques described 
above for linkage to vectors. Any techniques known in the art can be used. 

When any appropriate host cells or organisms are used to replicate . 
and/or express the polynucleotides or nucleic acids of the invention, the resulting 
replicated nucleic acid, RNA, expressed protein or polypeptide, is within the scope of 

15 the invention as a product of the host cell or organism. The product is recovered by any 
appropriate means known in the art. 

Once the gene corresponding to a selected polynucleotide is identified, 
its expression can be regulated in the cell to which the gene is native. For example, an 
endogenous gene of a cell can be regulated by an exogenous regulatory sequence as 

20 disclosed in U.S. Patent No. 5,641,670. 

Identification of Functional and Structural Motifs of Novel Genes 

Translations of the nucleotide sequence of ihe provided polynucleotides, 

cDNAs or full genes can be aligned with individual known sequences. Similarity with 

individual sequences can be used to determine the activity of the polypeptides encoded 
25 by the polynucleotides of the invention. Also, sequences exhibiting similarity with 

more than one individual sequence can exhibit activities that are characteristic of either 

or both individual sequences. 

The full length sequences and fragments of the polynucleotide sequences 

of the nearest neighbors can be used as probes and primers to identify and isolate the 
30 full length sequence corresponding to provided polynucleotides. The nearest neighbors 

can indicate a tissue or cell type to be used to construct a library for the full-length 

sequences corresponding to the provided polynucleotides. \ 

Typically, a selected polynucleotide is translated in all six frames to 

determine the best alignment with the individual sequences. The sequences disclosed 
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herein in the Sequence Listing are in a 5' to 3' orientation and translation in three 
frames can be sufficient. These amino acid sequences are referred to, generally, as 
query sequences, which will be aligned with the individual sequences. Databases with 
individual sequences are described in "Computer Methods for Macromolecular 
5 Sequence Analysis" Methods in Enzymology (1996) 266, Doolittle, Academic Press, 
Inc., a division of Harcourt Brace & Co., San Diego, California, USA. Databases 
include Genbank, EMBL, and DNA Database of Japan (DDBJ). 

Query and individual sequences can be aligned using the methods and 
computer programs described above, and; include BLAST, available over the world 

10 wide web at http://www.ncbi.nlm.nhi.gov/BLAST. Another alignment algorithm is 
Fasta, available in the Genetics Computing Group (GCG) package, Madison, 
Wisconsin, USA, a wholly owned subsidiary of Oxford Molecular Group, Inc.* Other 
techniques for alignment are described in Doolittle, supra. Preferably, an alignment 
program that permits gaps in the sequence is utilized to align the sequences. The 

15 Smith- Waterman is one type of algorithm that permits gaps in sequence alignments. 
See Metk Mol Biol. (1997) 70: 173-187. Also, the GAP program using the Needleman 
and Wunsch alignment method can be utilized to align sequences. An alternative search 
strategy uses MPSRCH software, which runs on a MASPAR computer. MPSRCH uses 
a Smith- Waterman algorithm to score sequences on a massively parallel computer. 

20 This approach improves ability to identify sequences that are distantly related matches, 
and is especially tolerant of small gaps and nucleotide sequence errors. Amino acid 
sequences encoded by the provided polynucleotides can be used to search both protein 
and DNA databases. 

High Similarity . In general, in alignment results considered to be of high 

25 similarity, the percent of the alignment region length is typically at least about 55% of 
total length query sequence; more typically, at least about 58%; even more typically; at 
least about 60% of the total residue length of the query sequence. Usually, percent 
length of the alignment region can be as much as about 62%; more usually, as much as 
about 64%; even more usually, as much as about 66%. Further, for high similarity, the 

30 region of alignment, typically, exhibits at least about 75% of sequence identity; more 
typically, at least about 78%; even more typically; at least about 80% sequence identity. 
Usually, percent sequence identity can be as much as about 82%; more usually, as much 
as about 84%; even more usually, as much as about 86%. 

The p value is used in conjunction with these methods. If high similarity 

35 is found, the query sequence is considered to have high similarity with a profile 
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sequence when the p value is less than or equal to about 10' 2 ; more usually; less than or 
equal to about 10* 3 ; even more usually; less than or equal to about 10" 4 . More typically, 
the p value is no more than about 10" 5 ; more typically; no more than or equal to about 
10* 10 ; even more typically; no more than or equal to about 10" 15 for the query sequence 
5 to be considered high similarity. 

Similarity Determined bv Sequence Identity Alone . Sequence identity 
alone can be used to determine similarity of a query sequence to an individual sequence 
and can indicate the activity of the sequence. Such an alignment, preferably, permits 
gaps to align sequences. Typically, the query sequence is related to the profile sequence 

10 if the sequence identity over the entire query sequence is at least about 15%; more 
typically, at least about 20%; even more typically, at least about 25%; even more 
typically, at least about 50%. Sequence identity alone as a measure of similarity is most 
useful when the query sequence is usually, at least 80 residues in length; more usually, 
90 residues; even more usually, at least 95 amino acid residues in length. More 

15 typically, similarity can be concluded based on sequence identity alone when the query 
sequence is preferably 100 residues in length; more preferably, 120 residues in length; 
even more preferably, 150 amino acid residues in length. 

Alignments with Profile and Multiple Aligned Sequences . Translations 
of the provided polynucleotides can be aligned with amino acid profiles that define 

20 either protein families or common motifs. Also, translations of the provided 
polynucleotides can be aligned to multiple sequence alignments (MSA) comprising the 
polypeptide sequences of members of protein families or motifs. Similarity or identity 

wi t h pmfil r w| ii enf»pg nr MKA^ mn hp i nr . ftH tn Hptprminp thr activity nf thp gpnp 

products (e.g., polypeptides) encoded by the provided polynucleotides or corresponding 
25 cDNA or genes. For example, sequences that show an identity or similarity with a 
chemokine profile or MSA can exhibit chemokine activities. 

Profiles can be designed manually by (1) creating an MSA, which is an 
alignment of the amino acid sequence of members that belong to the family and (2) 
constructing a statistical representation of the alignment. Such methods are described, 
30 for example, in Birney et al., NucL Acid Res, (1996) 24(14): 2730-2739. MSAs of some 
protein families and motifs are publicly available. MSAs are described also in 
Sonnhammer et al., Proteins (1997) 28: 405-420. A brief description of MSAs is 
reported in Pascarella et al., Prot. Eng. (1996) P(5):249-251. Techniques for building 
profiles from MSAs are described in Sonnhammer et al., supra; Birney et al., supra; 
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and "Computer Methods for Macromolecular Sequence Analysis," Methods in 
Enzymology (1996) 266, Doolittle, Academic Press, Inc., San Diego, California, USA. 

Similarity between a query sequence and a protein family or motif can be 
determined by (a) comparing the query sequence against the profile and/or (b) aligning 
5 the query sequence with the members of the family or motif. Typically, a program such 
as Searchwise is used to compare the query sequence to the statistical representation of 
the multiple alignment, also known as a profile (see Birney et al., supra). Other 
techniques to compare the sequence and profile are described in Sonnhammer et al., 
supra and Doolittle, supra. 

10 Next, methods described by Feng et al., J. Mol. Evol. (1987) 25:351 and 

Higgins et al., CABIOS (1989) 5:151 can be used align the query sequence with the 
members of a family or motif, also known as a MSA. Sequence alignments can be 
generated using any of a variety of software tools. Examples include PileUp, which 
creates a multiple sequence alignment, and is described in Feng et al., J. Mol. Evol. 

15 (1987) 25:351. Another method, GAP, uses the alignment method of Needleman et al., 
J. Mol. Biol (1970) 48:443. GAP is best suited for global alignment of sequences. A 
third method, BestFit, functions by inserting gaps to maximize the number of matches 
using the local homology algorithm of Smith et al., Adv. Appl Math (1981) 2:482. In 
general, the following factors are used to determine if a similarity between a query 

20 sequence and a profile or MSA exists: (1) number of conserved residues found in the 
query sequence, (2) percentage of conserved residues found in the query sequence, (3) 
number of frameshifts, and (4) spacing between conserved residues. 

Some alignment programs that both translate and align sequences can 
make any number of frameshifts when translating the nucleotide sequence to produce 

25 the best alignment. The fewer frameshifts needed to produce an alignment, the stronger 
the similarity or identity between the query and profile or MSAs. For example, a weak 
similarity resulting from no frameshifts can be a better indication of activity or structure 
of a query sequence, than a strong similarity resulting from two frameshifts. Preferably, 
three or fewer frameshifts are found in an alignment; more preferably two or fewer 

30 frameshifts; even more preferably, one or fewer frameshifts; even more preferably, no 
frameshifts are found in an alignment of query and profile or MSAs. 

Conserved residues are those amino acids found at a particular position 
in all or some of the family or motif members. Alternatively, a position is considered 
conserved if only a certain class of amino acids is found in a particular position in all or 
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some of the family members. For example, the N-terminal position can contain a 
positively charged amino acid, such as lysine, arginine, or histidine. 

Typically, a residue of a polypeptide is conserved when a class of amino 
acids or a single amino acid is found at a particular position in at least about 40% of all 
5 class members; more typically, at least about 50%; even more typically, at least about 
60% of the members. Usually, a residue is conserved when a class or single amino acid 
is found in at least about 70% of the members of a family or motif; more usually, at 
least about 80%; even more usually, at least about 90%; even more usually, at least 
about 95%. 

10 A residue is considered conserved when three unrelated amino acids are 

found at a particular position in the some or all of the members; more usually, two 
unrelated amino acids. These residues are conserved when the unrelated amino acids 
are found at particular positions in at least about 40% of all class member; more 
typically, at least about 50%; even more typically, at least about 60% of the members. 

15 Usually, a residue is conserved when a class or single amino acid is found in at least 
about 70% of the members of a family or motif; more usually, at least about 80%; even 
more usually, at least about 90%; even more usually, at least about 95%. 

A query sequence has similarity to a profile or MSA when the query 
sequence comprises at least about 25% of the conserved residues of the profile or MSA; 

20 more usually, at least about 30%; even more usually; at least about 40%. Typically, the 
query sequence has a stronger similarity to a profile sequence or MSA when the query 
sequence comprises at least about 45% of the conserved residues of the profile or MSA; 
uiuie typically, at least about 50%; oven more typically; at least about 55?^ o 

Identification of Secreted and Membrane-Bound Polypeptides 

25 Both secreted and membrane-bound polypeptides of the present 

invention are of particular interest. For example, levels of secreted polypeptides can be 
assayed in body fluids that are convenient, such as blood, plasma, serum, and other 
body fluids such as urine, prostatic fluid and semen. Membrane-bound polypeptides are 
useful for constructing vaccine antigens or inducing an immune response. Such 

30 antigens would comprise all or part of the extracellular region of the membrane-bound 
polypeptides. Because both secreted and membrane-bound polypeptides comprise a 
fragment of contiguous hydrophobic amino acids, hydrophobicity predicting algorithms 
can be used to identify such polypeptides. 
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A signal sequence is usually encoded by both secreted and membrane- 
bound polypeptide genes to direct a polypeptide to the surface of the cell. The signal 
sequence usually comprises a stretch of hydrophobic residues. Such signal sequences 
can fold into helical structures. Membrane-bound polypeptides typically comprise at 
5 least one transmembrane region that possesses a stretch of hydrophobic amino acids that 
can transverse the membrane. Some transmembrane regions also exhibit a helical 
structure. Hydrophobic fragments within a polypeptide can be identified by using 
computer algorithms. Such algorithms include Hopp & Woods, Proc. Natl Acad. Sci 
USA (1981) 75:3824-3828; Kyte & Doolittle, J. Mol Biol (1982) 157: 105-132; and 

10 RAOAR algorithm, Degli Esposti et al., Eur. J. Biochem. (1 990) 190: 207-219. 

Another method of identifying secreted and membrane-bound 
polypeptides is to translate the polynucleotides of the invention in all six frames and 
determine if at least 8 contiguous hydrophobic amino acids are present. Those 
translated polypeptides with at least 8; more typically, 10; even more typically. 12 

15 contiguous hydrophobic amino acids are considered to be either a putative secreted or 
membrane bound polypeptide. Hydrophobic amino acids include alanine, glycine, 
histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline, threonine, 
tryptophan, tyrosine, and valine 

Identification of the Function of an Expression Product of a Full-Length Gene 
20 Ribozymes, antisense constructs, and dominant negative mutants can be 

used to determine function of the expression product of a gene corresponding to a 
polynucleotide provided herein. The phosphoramidite method of oligonucleotide 
synthesis can be used to construct antisense molecules and ribozymes. See Beaucage et 
al., Tet. Lett. (1981) 22:1859 and U.S. Patent No. 4,668,777. Automated devices for 
25 synthesis are available to create oligonucleotides using this chemistry. Examples of 
such devices include Biosearch 8600, Models 392 and 394 by Applied Biosystems, a 
division of Perkin-Elmer Corp., Foster City, California, USA; and Expedite by 
Perceptive Biosystems, Framingham, Massachusetts, USA. Synthetic RNA, phosphate 
analog oligonucleotides, and chemically derivatized oligonucleotides can also be 
30 produced, and can be covalently attached to other molecules. RNA oligonucleotides 
can be synthesized, for example, using RNA phosphoramidites. This method can be 
performed on an automated synthesizer, such as Applied Biosystems, Models 392 and 
394, Foster City, California, USA. 
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Oligonucleotides of up to 200 nt can be synthesized, more typically, 100 
nt, more typically 50 nt; even more typically 30 to 40 nt. These synthetic fragments can 
be annealed and ligated together to construct larger fragments. See, for example, 
Sambrook et ah, supra. Trans-cleaving catalytic RNAs (ribozymes) are RNA 
5 molecules possessing endoribonuclease activity. Ribozymes are specifically designed 
for a particular target, and the target message must contain a specific nucleotide 
sequence. They are engineered to cleave any RNA species site-specifically in the 
background of cellular RNA. The cleavage event renders the mRNA unstable and 
prevents protein expression. Importantly, ribozymes can be used to inhibit expression 
10 of a gene of unknown function for the purpose of determining its function in an in vitro 
or in vivo context, by detecting the phenotypic effect. 

Antisense nucleic acids are designed to specifically bind to RNA, 
resulting in the formation of RNA-DNA or RNA-RNA hybrids, with an arrest of DNA 
replication, reverse transcription or messenger RNA translation. Antisense 
15 polynucleotides based on a selected polynucleotide sequence can interfere with 
expression of the corresponding gene. Antisense polynucleotides are typically 
generated within the cell by expression from antisense constructs that contain the 
antisense strand as the transcribed strand. Antisense polynucleotides based on the 
disclosed polynucleotides will bind and/or interfere with the translation of mRNA 
20 comprising a sequence complementary to the antisense polynucleotide. The expression 
products of control cells and cells treated with the antisense construct are compared to 
detect the protein product of the gene corresponding to the polynucleotide upon which 
the antisense construct is based. The piotcin is isolated and identified using routine 
biochemical methods. 

25 Given the extensive background literature and clinical experience in 

antisense therapy, one skilled in the art can use selected polynucleotides of the 
invention as additional potential therapeutics. The choice of polynucleotide can be 
narrowed by first testing them for binding to "hot spot" regions of the genome of 
cancerous cells. If a polynucleotide is identified as binding to a "hot spot," testing the 

30 polynucleotide as an antisense compound in the corresponding cancer cells is 
warranted. 

Dominant negative mutations also are readily generated for 
corresponding proteins that are active as homomultimers. A mutant polypeptide will 
interact with wild-type polypeptides (made from the other allele) and form a non- 
35 functional multimer. Thus, a mutation is in a substrate-binding domain, a catalytic 
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domain, or a cellular localization domain. 1 Preferably, the mutant polypeptide will be 
overproduced. Point mutations are made that have such an effect. In addition, fusion of 
different polypeptides of various lengths to! the terminus of a protein can yield dominant 
negative mutants. General strategies are available for making dominant negative 
5 mutants (see, e.g, , Herskowitz, Nature (1 987) 329:2 1 9). Such techniques can be used to 
create loss of function mutations, which are useful for determining protein function. 

Polypeptides and Variants Thereof 

The polypeptides of the invention include those encoded by the disclosed 

i 

polynucleotides, as well as nucleic acids that, by virtue of the degeneracy of the genetic 

10 code, are not identical in sequence to the disclosed polynucleotides. Thus, the invention 
includes within its scope a polypeptide encoded by a polynucleotide having the 
sequence of any one of SEQ ID NOs: 1-3351 or a variant thereof. 

In general, the term "polypeptide" as used herein refers to both the full 
length polypeptide encoded by the recited polynucleotide, the polypeptide encoded by 

15 the gene represented by the recited polynucleotide, as well as portions or fragments 
thereof. "Polypeptides" also includes variants of the naturally occurring proteins, where 
such variants are homologous or substantially similar to the naturally occurring protein, 
and can be of an origin of the same or different species as the naturally occurring 
protein (e.g., human, murine, or some other species that naturally expresses the recited 

20 polypeptide, usually a mammalian species). In general, variant polypeptides have a 
sequence that has at least about 80%, usually at least about 90%, and more usually at 
least about 98% sequence identity with a differentially expressed polypeptide of the 
invention, as measured by BLAST using the parameters described above. The variant 
polypeptides can be naturally or non-naturally glycosylated, i.e., the polypeptide has a 

25 glycosylation pattern that differs from . the glycosylation pattern found in the 
corresponding naturally occurring protein. , 

The invention also encompasses homologs of the disclosed polypeptides 
(or fragments thereof) where the homologs are isolated from other species, i.e., other 
animal or plant species, where such homologs, usually mammalian species, e.g, 

30 rodents, such as mice, rats; domestic animals, e.g, horse, cow, dog, cat; and humans. 
By "homolog" is meant a polypeptide having at least about 35%, usually at least about 
40% and more usually at least about 60% amino acid sequence identity to a particular 
differentially expressed protein as identified above, where sequence identity is 
determined using the BLAST algorithm, with the parameters described above. 

v> 
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In general, the polypeptides of the subject invention are provided in a 
non-naturally occurring environment, e.g., are separated from their naturally occurring 
environment. In certain embodiments, the subject protein is present in a composition 
that is enriched for the protein as compared to a control. As such, purified polypeptide 
5 is provided, where by purified is meant that the protein is present in a composition that 
is substantially free of non-differentially expressed polypeptides, where by substantially 
free is meant that less than 90%, usually less than 60% and more usually less than 50% 
of the composition is made up of non-differentially expressed polypeptides. 

Also within the scope of the invention are variants; variants of 

10 polypeptides include mutants, fragments, and fusions. Mutants can include amino acid 
substitutions, additions or deletions. The amino acid substitutions can be conservative 
amino acid substitutions or substitutions to eliminate non-essential amino acids, such as 
to alter a glycosylation site, a phosphorylation site or an acetylation site, or to minimize 
misfolding by substitution or deletion of one or more cysteine residues that are not 

15 necessary for function. Conservative amino acid substitutions are those that preserve 
the general charge, hydrophobicity/ hydrophilicity, and/or steric bulk of the amino acid 
substituted. Variants can be designed so as to retain biological activity of a particular 
region of the protein {e.g., a functional domain and/or, where the polypeptide is a 
member of a protein family, a region associated with a consensus sequence). Selection 

20 of amino acid alterations for production of variants can be based upon the accessibility 
(interior vs. exterior) of the amino acid (see, e.g., Go et al., Int. J. Peptide Protein Res. 
(1980) 75:211), the thermostability of the variant polypeptide (see, e.g., Querol et al., 

p '"' ffw » ( IQQrt ) Q lf ^). ' h cb'i " sylntrn titri (ipo, r g , Ohm a n d Thom se n, J. 

Gen. Microbiol. (1991) 757:579), desired disulfide bridges (see, e.g., Clarke et al., 

25 Biochemistry (1993) 52:4322; and Wakarchuk et al., Protein Eng. (1994) 7:1379), 
desired metal binding sites (see, e.g., Toma et al., Biochemistry (1991) 50:97, and 
Haezerbrouck et al., Protein Eng. (1993) 6:643), and desired substitutions with in 
proline loops (see, e.g., Masul et al., Appl Env. Microbiol. (1994) 60:3579). Cysteine- 
depleted muteins can be produced as disclosed in U.S. Patent No. 4,959,3 14. 

30 Variants also include fragments of the polypeptides disclosed herein, 

particularly biologically active fragments and/or fragments corresponding to functional 
domains. Fragments of interest will typically be at least about 1 0 aa to at least about 1 5 
aa in length, usually at least about 50 aa in length, and can be as long as 300 aa in length 
or longer, but will usually not exceed about 1000 aa in length, where the fragment will 

35 have a stretch of amino acids that is identical to a polypeptide encoded by a 
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polynucleotide having a sequence of any SEQ ID NOs: 1-3351, or a homolog thereof. 
The protein variants described herein are encoded by polynucleotides that are within the 
scope of the invention. The genetic code can be used to select the appropriate codons to 
construct the corresponding variants. 

5 Computer-Related Embodiments 

In general, a library of polynucleotides is a collection of sequence 
information, which information is provided in either biochemical form (e.g., as a 
collection of polynucleotide molecules), or in electronic form (e.g., as a collection of 
polynucleotide sequences stored in a computer-readable form, as in a computer system 
10 and/or as part of a computer program). The sequence information of the 

i 

polynucleotides can be used in a variety of ways, e.g., as a resource for gene discovery, 
as a representation of sequences expressed in a selected cell type (e.g., cell type 
markers), and/or as markers of a given disease or disease state. In general, a disease 
marker is a representation of a gene product that is present in all cells affected by 
15 disease either at an increased or decreased level relative to a normal cell (e.g., a cell of 

i 

the same or similar type that is not substantially affected by disease). For example, a 
polynucleotide sequence in a library can be a polynucleotide that represents an mRNA, 
polypeptide, or other gene product encoded by the polynucleotide, that is either 

i 

overexpressed or underexpressed in a breast ductal cell affected by cancer relative to a 

20 normal (i.e., substantially disease-free) breast cell. 

The nucleotide sequence information of the library can be embodied in 
any suitable form, e.g., electronic or biochemical forms. For example, a library of 
sequence information embodied in electronic form comprises an accessible computer 
data file (or, in biochemical form, a collection of nucleic acid molecules) that contains 

25 the representative nucleotide sequences of genes that are differentially expressed (e.g., 
overexpressed or underexpressed) as between, for example, i) a cancerous cell and a 
normal cell; ii) a cancerous cell and a dysplastic cell; iii) a cancerous cell and a cell 
affected by a disease or condition other than cancer; iv) a metastatic cancerous cell and 
a normal cell and/or non-metastatic cancerous cell; v) a malignant cancerous cell and a 

30 non-malignant cancerous cell (or a normal cell) and/or vi) a dysplastic cell relative to a 
normal cell. Other combinations and comparisons of cells affected by various diseases 
or stages of disease will be readily apparent to the ordinarily skilled artisan. 
Biochemical embodiments of the library include a collection of nucleic acids that have 
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the sequences of the genes in the library, where the nucleic acids can correspond to the 
entire gene in the library or to a fragment thereof, as described in greater detail below. 

The polynucleotide libraries of the subject invention generally comprise 
sequence information of a plurality of polynucleotide sequences, where at least one of 
5 the polynucleotides has a sequence of any of SEQ ID NOs: 1-3351. By plurality is 
meant at least 2, usually at least 3 and can include up to all of SEQ ID NOs: 1-3351. 
The length and number of polynucleotides in the library will vary with the nature of the 
library, e.g., if the library is an oligonucleotide array, a cDNA array, a computer 
database of the sequence information, etc. 

10 Where the library is an electronic library, the nucleic acid sequence 

information can be present in a variety of media. "Media" refers to a manufacture, 
other than an isolated nucleic acid molecule, that contains the sequence information of 
the present invention. Such a manufacture provides the genome sequence or a subset 
thereof in a form that can be examined by means not directly applicable to the sequence 

15 as it exists in a nucleic acid. For example, the nucleotide sequence of the present 
invention, e.g., the nucleic acid sequences of any of the polynucleotides of SEQ ID 
NOs: 1-3351, can be recorded on computer readable media, e.g., any medium that can be 
read and accessed directly by a computer. Such media include, but are not limited to: 
magnetic storage media, such as a floppy disc, a hard disc storage medium, and a 

20 magnetic tape; optical storage media such as CD-ROM; electrical storage media such as 
RAM and ROM; and hybrids of these categories such as magnetic/optical storage 
media. One of skill in the art can readily appreciate how any of the presently known 
computer readable mediums can be us ed to create a manufacture comprising a recording 
of the present sequence information. "Recorded" refers to a process for storing 

25 information on computer readable medium, using any such methods as known in the art. 
Any convenient data storage structure can be chosen, based on the means used to access 
the stored information. A variety of data processor programs and formats can be used 
for storage, e.g., word processing text file, database format, etc. In addition to the 
sequence information, electronic versions of the libraries of the invention can be 

30 provided in conjunction or connection with other computer-readable information and/or 
other types of computer-readable files (e.g., searchable files, executable files, etc., 
including, but not limited to, for example, search program software, etc.). 

By providing the nucleotide sequence in computer readable form, the 
information can be accessed for a variety of purposes. Computer software to access 

35 sequence information is publicly available. For example, the BLAST (Altschul et al., 
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supra.) and BLAZE (Brutlag et al. Comp. Chem. (1993) 77:203) search algorithms on a 
Sybase system can be used to identify open reading frames (ORFs) within the genome 
that contain homology to ORFs from other organisms. 

As used herein, "a computer-based system" refers to the hardware 
5 means, software means, and data storage means used to analyze the nucleotide sequence 
information of the present invention. The minimum hardware of the computer-based 
systems of the present invention comprises a central processing unit (CPU), input 
means, output means, and data storage means. A skilled artisan can readily appreciate 
that any one of the currently available computer-based system are suitable for use in the 

10 present invention. The data storage means can comprise any manufacture comprising a 
recording of the present sequence information as described above, or a memory access 
means that can access such a manufacture. 

"Search means" refers to one or more programs implemented on the 
computer-based system, to compare a target sequence or target structural motif, or 

15 expression levels of a polynucleotide in a sample, with the stored sequence information. 
Search means can be used to identify fragments or regions of the genome that match a 
particular target sequence or target motif. A variety of known algorithms are publicly 
known and commercially available, e.g., MacPattern (EMBL), BLASTN and BLASTX 
(NCBI). A "target sequence" can be any polynucleotide or amino acid sequence of six 

20 or more contiguous nucleotides or two or more amino acids, preferably from about 10 
to 100 amino acids or from about 30 to 300 nt. A variety of comparing means can be 
used to accomplish comparison of sequence information from a sample (e.g., to analyze 
target sequences, target motifs, or relative expression levels) with the data storage 
means. A skilled artisan can readily recognize that any one of the publicly available 

25 homology search programs can be used as the search means for the computer based 
systems of the present invention to accomplish comparison of target sequences and 
motifs. Computer programs to analyze expression levels in a sample and in controls are 
also known in the art. 

A "target structural motif," or "target motif," refers to any rationally 

30 selected sequence or combination of sequences in which the sequence(s) are chosen 
based on a three-dimensional configuration that is formed upon the folding of the target 
motif, or on consensus sequences of regulatory or active sites. There are a variety of 
target motifs known in the art. Protein target motifs include, but arc not limited to, 
enzyme active sites and signal sequences. Nucleic acid target motifs include, but are 

i 
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not limited to, hairpin structures, promoter sequences and other expression elements 
such as binding sites for transcription factors. 

A variety of structural formats for the input and output means can be 
used to input and output the information in the computer-based systems of the present 
5 invention. One format for an output means ranks the relative expression levels of 
different polynucleotides. Such presentation provides a skilled artisan with a ranking of 
relative expression levels to determine a gene expression profile. 

As discussed above, the "library" of the invention also encompasses 
biochemical libraries of the polynucleotides of SEQ ID NOs: 1-3351, e.g., collections of 

10 nucleic acids representing the provided polynucleotides. The biochemical libraries can 
take a variety of forms, e.g., a solution of cDNAs, a pattern of probe nucleic acids stably 
associated with a surface of a solid support (i.e., an array) and the like. Of particular 
interest are nucleic acid arrays in which one or more of SEQ ID NOs: 1-3351 is 
represented on the array. By array is meant an article of manufacture that has at least a 

1 5 substrate with at least two distinct nucleic acid targets on one of its surfaces, where the 
number of distinct nucleic acids can be considerably higher, typically being at least 10 
nt, usually at least 20 nt and often at least 25 nt. A variety of different array formats 
have been developed and are known to those of skill in the art. The arrays of the subject 
invention find use in a variety of applications, including gene expression analysis, drug 

20 screening, mutation analysis and the like, as disclosed in the above-listed exemplary 
patent documents. 

In addition to the above nucleic acid libraries, analogous libraries of 

p oly p ept i de nrr 5i1<m prnviHnrl nrhnrr* thp urtv>rp tfr? pnlypppfj^ of thf> lihr a ry will 

represent at least a portion of the polypeptides encoded by SEQ ID NOs: 1-3351. 

25 Use of Polynucleotide Probes in Mapping, and in Tissue Profiling 

Polynucleotide probes, generally comprising at least 12 contiguous nt of 
a polynucleotide as shown in the Sequence Listing, are used for a variety of purposes, 
such as chromosome mapping of the polynucleotide and detection of transcription 
levels. Additional disclosure about preferred regions of the disclosed polynucleotide 

30 sequences is found in the Examples. A probe that hybridizes specifically to a 
polynucleotide disclosed herein should provide a detection signal at least 5-, 10-, or 20- 
fold higher than the background hybridization provided with other unrelated sequences. 

Detection of Expression Levels . Nucleotide probes are used to detect 
expression of a gene corresponding to the provided polynucleotide. In Northern blots, 
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mRNA is separated electrophoretically and contacted with a probe. A probe is detected 
as hybridizing to an mRNA species of a particular size. The amount of hybridization is 
quantitated to determine relative amounts of expression, for example under a particular 
condition. Probes are used for in situ hybridization to cells to detect expression. Probes 
5 can also be used in vivo for diagnostic detection of hybridizing sequences. Probes are 
typically labeled with a radioactive isotope. Other types of detectable labels can be 
used such as chromophores, fluors, and enzymes. Other examples of nucleotide 
hybridization assays are described in WO92/02526 and U.S. Patent No. 5,124,246. 

Alternatively, the Polymerase Chain Reaction (PCR) is another means 

10 for detecting small amounts of target nucleic acids (see, e.g., Mullis et al., Meth. 
Enzymol. (1987) 755:335; U.S. Patent No. 4,683,195; and U.S. Patent No. 4,683,202). 
Two primer polynucleotides nucleotides that hybridize with the target nucleic acids are 
used to prime the reaction. The primers can be composed of sequence within or 3' and 
5* to the polynucleotides of the Sequence Listing. Alternatively, if the primers are 3* and 

15 5' to these polynucleotides, they need not hybridize to them or the complements. After 
amplification of the target with a thermostable polymerase, the amplified target nucleic 
acids can be detected by methods known in the art, e.g., Southern blot. mRNA or 
cDNA can also be detected by traditional blotting techniques {e.g., Southern blot, 
Northern blot, etc.) described in Sambrook et al., "Molecular Cloning: A Laboratory 

20 Manual" (New York, Cold Spring Harbor Laboratory, 1989) (e.g., without PCR 
amplification). In general, mRNA or cDNA generated from mRNA using a polymerase 
enzyme can be purified and separated using gel electrophoresis, and transferred to a 
solid support, such as nitrocellulose. The solid support is exposed to a labeled probe, 
washed to remove any unhybridized probe, and duplexes containing the labeled probe 

25 are detected. 

Mapping . Polynucleotides of the present invention can be used to 
identify a chromosome on which the corresponding gene resides. Such mapping can be 
useful in identifying the function of the polynucleotide-related gene by its proximity to 
other genes with known function. Function can also be assigned to the polynucleotide- 

30 related gene when particular syndromes or diseases map to the same chromosome. For 
example, use of polynucleotide probes in identification and quantification of nucleic 
acid sequence aberrations is described in U.S. Patent No. 5,783,387. An exemplary 
mapping method is fluorescence in situ hybridization (FISH), which facilitates 
comparative genomic hybridization to allow total genome assessment of changes in 

35 relative copy number of DNA sequences (see, e.g., Valdes et al., Methods in Molecular 
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Biology (1997) 65:1). Polynucleotides can also be mapped to particular chromosomes 
using, for example, radiation hybrids or chromosome-specific hybrid panels. See Leach 
et al., Advances in Genetics, (1995) 35:63-99; Walter et al., Nature Genetics (1994) 
7:22; Walter and Goodfellow, Trends in Genetics (1992) 9:352. Panels for radiation 
5 hybrid mapping are available from Research Genetics, Inc., Huntsville, Alabama, USA. 
The statistical program RHMAP can be used to construct a map based on the data from 
radiation hybridization with a measure of the relative likelihood of one order versus 
another. RHMAP is available via the world wide web at http://www.sph.umich.edu- 
/group/statgen/software. In addition, commercial programs are available for identifying 

10 regions of chromosomes commonly associated with disease, such as cancer. 

Tissue Typing or Profiling . Expression of specific mRNA 
corresponding to the provided polynucleotides can vary in different cell types and can 
be tissue-specific. This variation of mRNA levels in different cell types can be 
exploited with nucleic acid probe assays to determine tissue types. For example, PCR, 

15 branched DNA probe assays, or blotting techniques utilizing nucleic acid probes 
substantially identical or complementary to polynucleotides listed in the Sequence 
Listing can determine the presence or absence of the corresponding cDNA or mRNA. 

Tissue typing can be used to identify the developmental organ or tissue 
source of a metastatic lesion by identifying the expression of a particular marker of that 

20 organ or tissue. If a polynucleotide is expressed only in a specific tissue type, and a 
metastatic lesion is found to express that polynucleotide, then the developmental source 
of the lesion has been identified. Expression of a particular polynucleotide can be 

Use of Polymorphisms . A polynucleotide of the invention can be used in 
25 forensics, genetic analysis, mapping, and diagnostic applications where the 
corresponding region of a gene is polymorphic in the human population. Any means for 
detecting a polymorphism in a gene can be used, including, but not limited to 
electrophoresis of protein polymorphic variants, differential sensitivity to restriction 
enzyme cleavage, and hybridization to allele-specific probes. 



30 Antibody Production 

Expression products of a polynucleotide of the invention, as well as the 
corresponding mRNA, cDNA, or complete gene, can be prepared and used for raising 
antibodies for experimental, diagnostic, and therapeutic purposes. For polynucleotides 
to which a corresponding gene has not been assigned, this provides an additional 
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method of identifying the corresponding gene. The polynucleotide or related cDNA is 
expressed as described above, and antibodies are prepared. These antibodies are 
specific to an epitope on the polypeptide encoded by the polynucleotide, and can 
precipitate or bind to the corresponding native protein in a cell or tissue preparation or 
5 in a cell-free extract of an in vitro expression system. 

Methods for production of monoclonal and polyclonal antibodies that 
specifically bind a selected antigen are well known in the art. The antibodies 
specifically bind to epitopes present in the polypeptides encoded by polynucleotides 
disclosed in the Sequence Listing. Typically, at least 6, 8, 10, or 12 contiguous amino 
10 acids are required to form an epitope. Epitopes that involve non-contiguous amino 
acids may require a longer polypeptide, e.g., at least 15, 25, or 50 amino acids. 
Antibodies that specifically bind to human polypeptides encoded by the provided 
polynucleotides should provide a detection signal at least 5-, 10-, or 20-fold higher than 
a detection signal provided with other proteins when used in Western blots or other 
15 immunochemical assays. Preferably, antibodies that specifically polypeptides of the 
invention do not bind to other proteins in immunochemical assays at detectable levels 
and can immunoprecipitate the specific polypeptide from solution. 

The invention also contemplates naturally occurring antibodies specific 
for a polypeptide of the invention. For example, serum antibodies to a polypeptide of 
20 the invention in a human population can be purified by methods well known in the art, 
e -g-> by passing antiserum over a column to which the corresponding selected 
polypeptide or fusion protein is bound. The bound antibodies can then be eluted from 
the column, for example using a buffer with a high salt concentration. 

In addition to the antibodies discussed above, the invention also 
25 contemplates genetically engineered antibodies, antibody derivatives (e.g., single chain 
antibodies, antibody fragments (e.g., Fab, etc.)), according to methods well known in 
the art. 

Other embodiments of the present invention include humanized 
monoclonal antibodies capable of binding to the polypeptides of the invention. The 

30 phrase "humanized antibody" refers to an antibody derived from a non-human antibody 
- typically a mouse monoclonal antibody. Alternatively, a humanized antibody may be 
derived from a chimeric antibody that retains or substantially retains the antigen- 
binding properties of the parental, non-human, antibody but which exhibits diminished 
immunogenicity as compared to the parental antibody when administered to humans. 

35 The phrase "chimeric antibody," as used herein, refers to an antibody containing 
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sequence derived from two different antibodies (see, e.g., U.S. Patent No. 4,816,567) 
which typically originate from different species. Most typically, chimeric antibodies 
comprise human and murine antibody fragments, generally human constant and mouse 
variable regions. 

5 Because humanized antibodies are far less immunogenic in humans than 

the parental mouse monoclonal antibodies, they can be used for the treatment of humans 
with far less risk of anaphylaxis. Thus, these antibodies may be preferred in therapeutic 
applications that involve in vivo administration to a human such as, e.g., use as radiation 
sensitizers for the treatment of neoplastic disease or use in methods to reduce the side 

1 0 effects of, e.g., cancer therapy. 

Humanized antibodies may be achieved by a variety of methods 
including, for example: (1) grafting the non-human complementarity determining 
regions (CDRs) onto a human framework and constant region (a process referred to in 
the art as "humanizing"), or, alternatively, (2) transplanting the entire non-human 

15 variable domains, but "cloaking" them with a human-like surface by replacement of 
surface residues (a process referred to in the art as "veneering"). In the present 
invention, humanized antibodies will include both "humanized" and "veneered" 
antibodies. These methods are disclosed in, e.g., Jones et al, Nature 327:522-525 
(1986); Morrison et al., Proc. Natl. Acad. Sci, U.S.A., 57:6851-6855 (1984); Morrison 

20 and Oi, Adv. Immunol, 44:65-92 (1988); Verhoeyer et al., Science 239:1534-1536 
(1988); Padlan, Molec. Immun. 25:489-498 (1991); Padlan, Molec. Immunol. 3J(3):\69- 
217 (1994); and Kettleborough, C.A. et al, Protein Eng. 4(7):773-S3 (1991) each of 

1*1** 11 *i r> 

which is incorporated herein by reference. 

The phrase "complementarity determining region" refers to amino acid 

25 sequences which together define the binding affinity and specificity of the natural Fv 
region of a native immunoglobulin binding site. See, e.g., Chothia et al., J. Mol. Biol. 
79(5:901-917 (1987); Kabat et al., U.S. Dept. of Health and Human Services NIH 
Publication No. 91-3242 (1991). The phrase "constant region" refers to the portion of 
the antibody molecule that confers effector functions. In the present invention, mouse 

30 constant regions are substituted by human constant regions. The constant regions of the 
subject humanized antibodies are derived from human immunoglobulins. The heavy 
chain constant region can be selected from any of the five isotypes: alpha, delta, 
epsilon, gamma or mu. 

One method of humanizing antibodies comprises aligning the non- 
35 human heavy and light chain sequences to human heavy and light chain sequences, 
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selecting and replacing the non-human framework with a human framework based on 
such alignment, molecular modeling to predict the conformation of the humanized 
sequence and comparing to the conformation of the parent antibody. This process is 
followed by repeated back mutation of residues in the CDR region which disturb the 
5 structure of the CDRs until the predicted conformation of the humanized sequence 
model closely approximates the conformation of the non-human CDRs of the parent 
non-human antibody. Such humanized antibodies may be further derivatized to 
facilitate uptake and clearance, e.g., via Ashwell receptors. See, e.g., U.S. Patent Nos. 
5,530,101 and 5,585,089 which patents are incorporated herein by reference. 

10 Humanized antibodies can also be produced using transgenic animals 

that are engineered to contain human immunoglobulin loci. For example, WO 
98/24893 discloses transgenic animals having a human Ig locus wherein the animals do 
not produce functional endogenous immunoglobulins due to the inactivation of 
endogenous heavy and light chain loci. WO 91/10741 also discloses transgenic non- 

1 5 primate mammalian hosts capable of mounting an immune response to an immunogen, 
wherein the antibodies have primate constant and/or variable regions, and wherein the 
endogenous immunoglobulin-encoding loci are substituted or inactivated. WO 
96/30498 discloses the use of the Cre/Lox system to modify the immunoglobulin locus 
in a mammal, such as to replace all or a portion of the constant or variable region to 

20 form a modified antibody molecule. WO 94/02602 discloses non-human mammalian 
hosts having inactivated endogenous Ig loci and functional human Ig loci. U.S. Patent 
No. 5,939,598 discloses methods of making transgenic mice in which the mice lack 
endogenous heavy claims, and express an exogenous immunoglobulin locus comprising 
one or more xenogeneic constant regions. 

25 Using a transgenic animal described above, an immune response can be 

produced to a selected antigenic molecule, and antibody-producing cells can be 
removed from the animal and used to produce hybridomas that secrete human 
monoclonal antibodies. Immunization protocols, adjuvants, and the like are known in 
the art, and are used in immunization of, for example, a transgenic mouse as described 

30 in WO 96/33735. This publication discloses monoclonal antibodies against a variety of 
antigenic molecules including IL-6, IL-8, TNF , human CD4, L-selectin, gp39, and 
tetanus toxin. The monoclonal antibodies can be tested for the ability to inhibit or 
neutralize the biological activity or physiological effect of the corresponding protein. 
WO 96/33735 discloses that monoclonal antibodies against IL-8, derived from immune 

35 cells of transgenic mice immunized with IL-8, blocked IL-8-induced functions of 
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neutrophils. Human monoclonal antibodies with specificity for the antigen used to 
immunize transgenic animals are also disclosed in WO 96/34096. 

Polynucleotides or Arrays for Diagnostics 

5 Polynucleotide arrays are created by spotting polynucleotide probes onto 

a substrate (e.g., glass, nitrocellose, etc.) in a two-dimensional matrix or array having 
bound probes. The probes can be bound to the substrate by either covalent bonds or by 
non-specific interactions, such as hydrophobic interactions. Samples of polynucleotides 
can be detectably labeled (e.g., using radioactive or fluorescent labels) and then 

10 hybridized to the probes. Double stranded polynucleotides, comprising the labeled 
sample polynucleotides bound to probe polynucleotides, can be detected once the 
unbound portion of the sample is washed away. Techniques for constructing arrays and 
methods of using these arrays are described in EP 799 897; WO 97/29212; WO 
97/27317; EP 785 280; WO 97/02357; U.S. Patent No. 5,593,839; U.S. Patent No. 

15 5,578,832; EP 728 520; U.S. Patent No. 5,599,695; EP 721 016; U.S. Patent No. 
5,556,752; WO 95/22058; and U.S. Patent No. 5,631,734. Arrays can be used to, for 
example, examine differential expression of genes and can be used to determine gene 
function. For example, arrays can be used to detect differential expression of a 
polynucleotide between a test cell and control cell (e.g., cancer cells and normal cells). 

20 For example, high expression of a particular message in a cancer cell, which is not 
observed in a corresponding normal cell, can indicate a cancer specific gene product. 
Exemplary uses of arrays are further described in, for examp le, Pappalarado et al., Sent. 
Radiation Oncol. (1998) 5:217; and Ramsay, Nature Biotechnol (1998) 76:40. 

Differential Expression in Diagnosis 

25 The polynucleotides of the invention can also be used to detect 

differences in expression levels between two cells, e.g., as a method to identify 
abnormal or diseased tissue in a human. For polynucleotides corresponding to profiles 
of protein families, the choice of tissue can be selected according to the putative 
biological function. In general, the expression of a gene corresponding to a specific 

30 polynucleotide is compared between a first tissue that is suspected of being diseased 
and a second, normal tissue of the human. The tissue suspected of being abnormal or 
diseased can be derived from a different tissue type of the human, but preferably it is 
derived from the same tissue type; for example an intestinal polyp or other abnormal 
growth should be compared with normal intestinal tissue. The normal tissue can be the 
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same tissue as that of the test sample, or any normal tissue of the patient, especially 
those that express the polynucleotide -related gene of interest (e.g., brain, thymus, testis, 
heart, prostate, placenta, spleen, small intestine, skeletal muscle, pancreas, and the 
mucosal lining of the colon). A difference between the polynucleotide-related gene, 
5 mRNA, or protein in the two tissues which are compared, for example in molecular 
weight, amino acid or nucleotide sequence, or relative abundance, indicates a change in 
the gene, or a gene which regulates it, in the tissue of the human that was suspected of 
being diseased. Examples of detection of differential expression and its use in diagnosis 
of cancer are described in U.S. Patent Nos. 5,688,641 and 5,677,125. 

10 A genetic predisposition to disease in a human can also be detected by 

comparing expression levels of an mRNA or protein corresponding to a polynucleotide 
of the invention in a fetal tissue with levels associated in normal fetal tissue. Fetal 
tissues that are used for this purpose include, but are not limited to, amniotic fluid, 
chorionic villi, blood, and the blastomere of an in v/7ro-fertilized embryo. The 

1 5 comparable normal polynucleotide-related gene is obtained from any tissue. The mRNA 
or protein is obtained from a normal tissue of a human in which the polynucleotide- 
related gene is expressed. Differences such as alterations in the nucleotide sequence or 
size of the same product of the fetal polynucleotide-related gene or mRNA, or 
alterations in the molecular weight, amino acid sequence, or relative abundance of fetal 

20 protein, can indicate a germline mutation in the polynucleotide-related gene of the fetus, 
which indicates a genetic predisposition to disease. In general, diagnostic, prognostic, 
and other methods of the invention based on differential expression involve detection of 
a level or amount of a gene product, particularly a differentially expressed gene product, 
in a test sample obtained from a patient suspected of having or being susceptible to a 

25 disease (e.g., breast cancer, lung cancer, colon cancer and/or metastatic forms thereof), 
and comparing the detected levels to those levels found in normal cells (e.g., cells 
substantially unaffected by cancer) and/or other control cells (e.g., to differentiate a 
cancerous cell from a cell affected by dysplasia). Furthermore, the severity of the 
disease can be assessed by comparing the detected levels of a differentially expressed 

30 gene product with those levels detected in samples representing the levels of 
differentially gene product associated with* varying degrees of severity of disease. It 
should be noted that use of the term "diagnostic" herein is not necessarily meant to 
exclude "prognostic" or "prognosis," but rather is used as a matter of convenience. 

The term "differentially expressed gene" is generally intended to 

35 encompass a polynucleotide that can, for example, include an open reading frame 
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encoding a gene product (e.g., a polypeptide), and/or introns of such genes and adjacent 
5* and 3 1 non-coding nucleotide sequences involved in the regulation of expression, up 
to about 20 kb beyond the coding region, but possibly further in either direction. The 
gene can be introduced into an appropriate vector for extrachromosomal maintenance or 
5 for integration into a host genome. In general, a difference in expression level 
associated with a decrease in expression level of at least about 25%, usually at least 
about 50% to 75%, more usually at least about 90% or more is indicative of a 
differentially expressed gene of interest, i.e., a gene that is underexpressed or down- 
regulated in the test sample relative to a control sample. Furthermore, a difference in 

10 expression level associated with an increase in expression of at least about 25%, usually 
at least about 50% to 75%, more usually at least about 90% and can be at least about 
1 /4-fold, usually at least about 2-fold to about 10-fold, and can be about 100-fold to 
about 1,000-fold increase relative to a control sample is indicative of a differentially 
expressed gene of interest, i.e., an overexpressed or up-regulated gene. 

1 5 "Differentially expressed polynucleotide" as used herein means a nucleic 

acid molecule (RNA or DNA) comprising a sequence that represents a differentially 
expressed gene, e.g., the differentially expressed polynucleotide comprises a sequence 
(e.g., an open reading frame encoding a gene product) that uniquely identifies a 
differentially expressed gene so that detection of the differentially expressed 

20 polynucleotide in a sample is correlated with the presence of a differentially expressed 
gene in a sample. "Differentially expressed polynucleotides" is also meant to 
encompass fragments of the disclosed polynucleotides, e.g., fragments retaining 

Hinln^irnl Ht. l i ui l y J a* w e ll nimloin nriHr . hnmnfnjmiig cnhgtantially similar nr 

substantially identical (e.g., having about 90% sequence identity) to the disclosed 
25 polynucleotides. 



subject's susceptibility to a disease or disorder, determination as to whether a subject is 
presently affected by a disease or disorder, as well as to the prognosis of a subject 
affected by a disease or disorder (e:g, identification of pre-metastatic or metastatic 
30 cancerous states, stages of cancer, or responsiveness of cancer to therapy). The present 
invention particularly encompasses diagnosis of subjects in the context of breast cancer 
(e.g., carcinoma in situ (e.g., ductal carcinoma in situ), estrogen receptor (ER)-positive 
breast cancer, ER-negative breast cancer, or other forms and/or stages of breast cancer), 
lung cancer (e.g., small cell carcinoma, non-small cell carcinoma, mesothelioma, and 



Diagnosis" as used herein generally includes determination of a 
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other forms and/or stages of lung cancer), and colon cancer (e.g., adenomatous polyp, 
colorectal carcinoma, and other forms and/or stages of colon cancer). 

"Sample" or "biological sample" as used throughout here are generally 
meant to refer to samples of biological fluids or tissues, particularly samples obtained 
5 from tissues, especially from cells of the type associated with the disease for which the 
diagnostic application is designed (e.g., ductal adenocarcinoma), and the like. 
"Samples" is also meant to encompass derivatives and fractions of such samples (e.g., 
cell lysates). Where the sample is solid tissue, the cells of the tissue can be dissociated 
or tissue sections can be analyzed. 

10 Methods of the subject invention useful in diagnosis or prognosis 

typically involve comparison of the abundance of a selected differentially expressed 
gene product in a sample of interest with that of a control to determine any relative 
differences in the expression of the gene product, where the difference can be measured 
qualitatively and/or quantitatively. Quantitation can be accomplished, for example, by 

15 comparing the level of expression product detected in the sample with the amounts of 
product present in a standard curve. A comparison can be made visually; by using a 
technique such as densitometry, with or without computerized assistance; by preparing 
a representative library of cDNA clones of mRNA isolated from a test sample, 
sequencing the clones in the library to determine that number of cDNA clones 

20 corresponding to the same gene product, and analyzing the number of clones 
corresponding to that same gene product relative to the number of clones of the same 
gene product in a control sample; or by using an array to detect relative levels of 
hybridization to a selected sequence or set of sequences, and comparing the 
hybridization pattern to that of a control. The differences in expression are then 

25 correlated with the presence or absence of an abnormal expression pattern. A variety of 
different methods for determining the nucleic acid abundance in a sample are known to 
those of skill in the art (see, e.g., WO 97/273 17). In general, diagnostic assays of the 
invention involve detection of a gene product of a the polynucleotide sequence (e.g., 
mRNA or polypeptide) that corresponds to a sequence of SEQ ID NOs: 1-3351. The 

30 patient from whom the sample is obtained can be apparently healthy, susceptible to 
disease (e.g., as determined by family history or exposure to certain environmental 
factors), or can already be identified as having a condition in which altered expression 
of a gene product of the invention is implicated. 

Diagnosis can be determined based on detected gene product expression 

35 levels of a gene product encoded by at least one, preferably at least two or more, at least 
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3 or more, or at least 4 or more of the polynucleotides having a sequence set forth in 
SEQ ID NOs: 1-3351, and can involve detection of expression of genes corresponding to 
all of SEQ ID NOs: 1-3351 and/or additional sequences that can serve as additional 
diagnostic markers and/or reference sequences. Where the diagnostic method is 
5 designed to detect the presence or susceptibility of a patient to cancer, the assay 
preferably involves detection of a gene product encoded by a gene corresponding to a 
polynucleotide that is differentially expressed in cancer. Examples of such differentially 
expressed polynucleotides are described in the Examples below. Given the provided 
polynucleotides and information regarding their relative expression levels provided 

10 herein, assays using such polynucleotides and detection of their expression levels in 
diagnosis and prognosis will be readily apparent to the ordinarily skilled artisan. 

Any of a variety of detectable labels can be used in connection with the 
various embodiments of the diagnostic methods of the invention. Suitable detectable 
labels include fluorochromes, (eg., fluorescein isothiocyanate (FITC), rhodamine, 

15 Texas Red, phycoerythrin, allophycocyanin, 6-carboxyfluorescein (6-FAM), 
2',7'-dimethoxy-4',5'-dichloro-6-carboxyfluorescein 3 6-carboxy-X-rhodamine (ROX), 
6-carboxy-2',4',7',4,7-hexachlorofluorescein (HEX), 5-carboxyfluorescein (5-FAM) or 
N,N,N\N 5 -tetramethyl-6-carboxyrhodamine (TAMRA)), radioactive labels, (e.g., 32 P, 
S, ''H, etc.), and the like. The detectable label can involve a two stage systems (e.g., 

20 biotin-avidin, hapten-anti-hapten antibody, etc.) 

Reagents specific for the polynucleotides and polypeptides of the 
invention, such as antibodies and nucleotide probes, can be supplied in a kit for 
detecting the pieseme uf an expression product in a biological sample. The kit can a lso 
contain buffers or labeling components, as well as instructions for using the reagents to 

25 detect and quantify expression products in the biological sample. Exemplary 
embodiments of the diagnostic methods of the invention are described below in more 
detail. 

Polypeptide detection in diagnosis . In one embodiment, the test sample 
is assayed for the level of a differentially expressed polypeptide. Diagnosis can be 

30 accomplished using any of a number of methods to determine the absence or presence 
or altered amounts of the differentially expressed polypeptide in the test sample. For 
example, detection can utilize staining of cells or histological sections with labeled 
antibodies, performed in accordance with conventional methods. Cells can be 
permeabilized to stain cytoplasmic molecules. In general, antibodies that specifically 

35 bind a differentially expressed polypeptide of the invention are added to a sample, and 
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incubated for a period of time sufficient to allow binding to the epitope, usually at least 
about 10 minutes. The antibody can be detectably labeled for direct detection (e.g., 
using radioisotopes, enzymes, fluorescers, chemiluminescers, and the like), or can be 
used in conjunction with a second stage antibody or reagent to detect binding (e.g., 
5 biotin with horseradish peroxidase-conjugated avidin, a secondary antibody conjugated 
to a fluorescent compound, e.g., fluorescein, rhodamine, Texas red, etc.). The absence 
or presence of antibody binding can be determined by various methods, including flow 
cytometry of dissociated cells, microscopy, radiography, scintillation counting, etc. 
Any suitable alternative methods can of qualitative or quantitative detection of levels or 

10 amounts of differentially expressed polypeptide can be used, for example ELISA, 
western blot, immunoprecipitation, radioimmunoassay, etc. 

mRNA detection . The diagnostic methods of the invention can also or 
alternatively involve detection of mRNA encoded by a gene corresponding to a 
differentially expressed polynucleotides of the invention. Any suitable qualitative or 

15 quantitative methods known in the art for detecting specific mRNAs can be used. 
mRNA can be detected by, for example, in situ hybridization in tissue sections, by 
reverse transcriptase-PCR, or in Northern blots containing poly A+ mRNA. One of 
skill in the art can readily use these methods to determine differences in the size or 
amount of mRNA transcripts between two samples. mRNA expression levels in a 

20 sample can also be determined by generation of a library of expressed sequence tags 
(ESTs) from the sample, where the EST library is representative of sequences present in 
the sample (Adams, et al., (1991) Science 252:1651). Enumeration of the relative 
representation of ESTs within the library can be used to approximate the relative 
representation of the gene transcript within the starting sample. The results of EST 

25 analysis of a test sample can then be compared to EST analysis of a reference sample to 
determine the relative expression levels of a selected polynucleotide, particularly a 
polynucleotide corresponding to one or more of the differentially expressed genes 
described herein. Alternatively, gene expression in a test sample can be performed 
using serial analysis of gene expression (SAGE) methodology (e.g., Velculescu et al., 

30 Science (1995) 270:484) or differential display (DD) methodology (see, e.g., U.S. 
Patent NOs. 5,776,683 and 5,807,680). 

Alternatively, gene expression can be analyzed using hybridization 
analysis. Oligonucleotides or cDNA can be used to selectively identify or capture DNA 
or RNA of specific sequence composition, and the amount of RNA or cDNA hybridized 

35 to a known capture sequence determined qualitatively or quantitatively, to provide 

3(e 
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information about the relative representation of a particular message within the pool of 
cellular messages in a sample. Hybridization analysis can be designed to allow for 
concurrent screening of the relative expression of hundreds to thousands of genes by 
using, for example, array-based technologies having high density formats, including 
5 filters, microscope slides, or microchips, or solution-based technologies that use 
spectroscopic analysis (e.g., mass spectrometry). One exemplary use of arrays in the 
diagnostic methods of the invention is described below in more detail. 

Use of a single gene in diagnostic applications . The diagnostic methods 
of the invention can focus on the expression of a single differentially expressed gene. 

10 For example, the diagnostic method can involve detecting a differentially expressed 
gene, or a polymorphism of such a gene {e.g., a polymorphism in an coding region or 
control region), that is associated with disease. Disease-associated polymorphisms can 
include deletion or truncation of the gene, mutations that alter expression level and/or 
affect activity of the encoded protein, etc. 

15 A number of methods are available for analyzing nucleic acids for the 

presence of a specific sequence, e.g., a disease associated polymorphism. Where large 
amounts of DNA are available, genomic DNA is used directly. Alternatively, the 
region of interest is cloned into a suitable vector and grown in sufficient quantity for 
analysis. Cells that express a differentially expressed gene can be used as a source of 

20 mRNA, which can be assayed directly or reverse transcribed into cDNA for analysis. 
The nucleic acid can be amplified by conventional techniques, such as the polymerase 
chain reaction (PCR), to provide sufficient amounts for analysis, and a detectable label 

can be included i n the nmplifmntinn rnnrtirm {n et liping n Hpfprtqhly I^K^ I ^ primer Or 

detectably labeled oligonucleotides) to facilitate detection. Alternatively, various 
25 methods are also known in the art that utilize oligonucleotide ligation as a means of 
detecting polymorphisms, see e.g., Riley et al., Nucl. Acids Res. (1990) 75:2887; and 
Delahunty et al., Am. J. Hum. Genet. (1996) 55:1239. 

The amplified or cloned sample nucleic acid can be analyzed by one of a 
number of methods known in the art. The nucleic acid can be sequenced by dideoxy or 
30 other methods, and the sequence of bases compared to a selected sequence, e.g. , to a 
wild-type sequence. Hybridization with the polymorphic or variant sequence can also 
be used to determine its presence in a sample (e.g., by Southern blot, dot blot, etc.). The 
hybridization pattern of a polymorphic or variant sequence and a control sequence to an 
array of oligonucleotide probes immobilized on a solid support, as described in U.S. 
35 Patent No. 5,445,934, or in WO 95/35505, can also be used as a means of identifying 
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polymorphic or variant sequences associated with disease. Single strand 
conformational polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis 
(DGGE), and heteroduplex analysis in gel matrices are used to detect conformational 
changes created by DNA sequence variation as alterations in electrophoretic mobility. 
5 Alternatively, where a polymorphism creates or destroys a recognition site for a 
restriction endonuclease, the sample is digested with that endonuclease, and the 
products size fractionated to determine whether the fragment was digested. 
Fractionation is performed by gel or capillary electrophoresis, particularly acrylamide or 
agarose gels. 

10 Screening for mutations in a gene can be based on the functional or 

antigenic characteristics of the protein. Protein truncation assays are useful in detecting 
deletions that can affect the biological activity of the protein. Various immunoassays 
designed to detect polymorphisms in proteins can be used in screening. Where many 
diverse genetic mutations lead to a particular disease phenotype, functional protein 

15 assays have proven to be effective screening tools. The activity of the encoded protein 
can be determined by comparison with the wild-type protein. 

Pattern matching in diagnosis using arrays . In another embodiment, the 
diagnostic and/or prognostic methods of the invention involve detection of expression 
of a selected set of genes in a test sample to produce a test expression pattern (TEP). 

20 The TEP is compared to a reference expression pattern (REP), which is generated by 
detection of expression of the selected set of genes in a reference sample (e.g., a 
positive or negative control sample). The selected set of genes includes at least one of 
the genes of the invention, which genes correspond to the polynucleotide sequences of 
SEQ ID NOs: 1-3351 . Of particular interest is a selected set of genes that includes genes 

25 differentially expressed in the disease for which the test sample is to be screened. 

"Reference sequences" or "reference polynucleotides" as used herein in 
the context of differential gene expression analysis and diagnosis/prognosis refers to a 
selected set of polynucleotides, which selected set includes at least one or more of the 
differentially expressed polynucleotides described herein. A plurality of reference 

30 sequences, preferably comprising positive and negative control sequences, can be 
included as reference sequences. Additional suitable reference sequences are found in 
Genbank, Unigene, and other nucleotide sequence databases (including, e.g. , expressed 
sequence tag (EST), partial, and full-length sequences). 

"Reference array" means an array having reference sequences for use in 

35 hybridization with a sample, where the reference sequences include all, at least one of, 
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or any subset of the differentially expressed polynucleotides described herein. Usually 
such an array will include at least 3 different reference sequences, and can include any 
one or all of the provided differentially expressed sequences. Arrays of interest can 
further comprise sequences, including polymorphisms, of other genetic sequences, 
5 particularly other sequences of interest for screening for a disease or disorder (e.g., 
cancer, dysplasia, or other related or unrelated diseases, disorders, or conditions). The 
oligonucleotide sequence on the array will usually be at least about 12 nt in length, and 
can be of about the length of the provided sequences, or can extend into the flanking 
regions to generate fragments of 100 nt to =200 nt in length or more. Reference arrays 

10 can be produced according to any suitable methods known in the art. For example, 
methods of producing large arrays of oligonucleotides are described in U.S. Patent NOs. 
5,134,854 and 5,445,934 using light-directed synthesis techniques. Using a computer 
controlled system, a heterogeneous array of monomers is converted, through 
simultaneous coupling at a number of reaction sites, into a heterogeneous array of 

15 polymers. Alternatively, microarrays are generated by deposition of pre-synthesized 
oligonucleotides onto a solid substrate, for example as described in PCT published 
application no. WO 95/35505. 

A "reference expression pattern" or "REP" as used herein refers to the 
relative levels of expression of a selected set of genes, particularly of differentially 

20 expressed genes, that is associated with a selected cell type, e.g., a normal cell, a 
cancerous cell, a cell exposed to an environmental stimulus, and the like. A "test 
expression pattern" or "TEP" refers to relative levels of expression of a selected set of 

g e nes, particularly of different i a lly e xpressed genp< ^ in a test sample (e.g.. a cell of 

unknown or suspected disease state, from which mRNA is isolated). 

25 REPs can be generated in a variety of ways according to methods well 

known in the art. For example, REPs can be generated by hybridizing a control sample 
to an array having a selected set of polynucleotides (particularly a selected set of 
differentially expressed polynucleotides), acquiring the hybridization data from the 
array, and storing the data in a format that allows for ready comparison of the REP with 

30 a TEP. Alternatively, all expressed sequences in a control sample can be isolated and 
sequenced, e.g., by isolating mRNA from a control sample, converting the mRNA into 
cDNA, and sequencing the cDNA. The resulting sequence information roughly or 
precisdy reflects the identity and relative number of expressed sequences in the sample. 
The sequence information can then be stored in a format (e.g., a computer-readable 

35 format) that allows for ready comparison of the REP with a TEP. The REP can be 

W 
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normalized prior to or after data storage, and/or can be processed to selectively remove 
sequences of expressed genes that are of less interest or that might complicate analysis 
(e.g., some or all of the sequences associated with housekeeping genes can be 
eliminated from REP data). 
5 TEPs can be generated in a manner similar to REPs, e.g., by hybridizing 

a test sample to an array having a selected set of polynucleotides, particularly a selected 
set of differentially expressed polynucleotides, acquiring the hybridization data from the 
array, and storing the data in a format that allows for ready comparison of the TEP with 
a REP. The REP and TEP to be used in a comparison can be generated simultaneously, 

10 or the TEP can be compared to previously generated and stored REPs. 

In one embodiment of the invention, comparison of a TEP with a REP 
involves hybridizing a test sample with a reference array, where the reference array has 
one or more reference sequences for use in hybridization with a sample. The reference 
sequences include all, at least one of, or any subset of the differentially expressed 

1 5 polynucleotides described herein. Hybridization data for the test sample is acquired, the 
data normalized, and the produced TEP compared with a REP generated using an array 
having the same or similar selected set of differentially expressed polynucleotides. 
Probes that correspond to sequences differentially expressed between the two samples 
will show decreased or increased hybridization efficiency for one of the samples 

20 relative to the other. 

Methods for collection of data from hybridization of samples with a 
reference arrays are well known in the art. For example, the polynucleotides of the 
reference and test samples can be generated using a detectable fluorescent label, and 
hybridization of the polynucleotides in the samples detected by scanning the 

25 microarrays for the presence of the detectable label using, for example, a microscope 
and light source for directing light at a substrate. A photon counter detects fluorescence 
from the substrate, while an x-y translation stage varies the location of the substrate. A 
confocal detection device that can be used in the subject methods is described in U.S. 
Patent No. 5,631,734. A scanning laser microscope is described in Shalon et al., 

30 Genome Res, (1996) 5:639. A scan, using the appropriate excitation line, is performed 
for each fluorophore used. The digital images generated from the scan are then 
combined for subsequent analysis. For any particular array element, the ratio of the 
fluorescent signal from one sample (e.g., a test sample) is compared to the fluorescent 
signal from another sample (e.g., a reference sample), and the relative signal intensity 

35 determined. 
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Methods for analyzing the data collected from hybridization to arrays are 



well known in the art. For example, where detection of hybridization involves a 
fluorescent label, data analysis can include the steps of determining fluorescent intensity 
as a function of substrate position from the data collected, removing outliers, i.e., data 
5 deviating from a predetermined statistical distribution, and calculating the relative 
binding affinity of the targets from the remaining data. The resulting data can be 
displayed as an image with the intensity in each region varying according to the binding 
affinity between targets and probes. 



10 profile corresponding to that associated with a disease or non-disease state by 
comparing the TEP generated from the test sample to one or more REPs generated from 
reference samples (e.g., from samples associated with cancer or specific stages of 
cancer, dysplasia, samples affected by a disease other than cancer, normal samples, 
etc.). The criteria for a match or a substantial match between a TEP and a REP include 

15 expression of the same or substantially the same set of reference genes, as well as 
expression of these reference genes at substantially the same levels (e.g., no significant 
difference between the samples for a signal associated with a selected reference 
sequence after normalization of the samples, or at least no greater than about 25% to 
about 40% difference in signal strength for a given reference sequence. In general, a 

20 pattern match between a TEP and a REP includes a match in expression, preferably a 
match in qualitative or quantitative expression level, of at least one of, all or any subset 
of the differentially expressed genes of the invention. 



a computer program. Methods for preparation of substrate matrices (e.g., arrays), 
25 design of oligonucleotides for use with such matrices, labeling of probes, hybridization 
conditions, scanning of hybridized matrices, and analysis of patterns generated, 
including comparison analysis, are described in, for example, U.S. Patent No. 
5,800,992. 

Diagnosis. Prognosis and Management of Cancer 
30 The polynucleotides of the invention and their gene products are of 

particular interest as genetic or biochemical markers (e.g., in blood or tissues) that will 
detect the earliest changes along the carcinogenesis pathway and/or to monitor the 
efficacy of various therapies and preventive interventions. For example, the level of 
expression of certain polynucleotides can be indicative of a poorer prognosis, and 



In general, the test sample is classified as having a gene expression 



Pattern matching c an be peifoinied manually, o r can be performed using 




WO 01/02568 



PCT/USOO/18374 



therefore warrant more aggressive chemo- or radio-therapy for a patient or vice versa. 
The correlation of novel surrogate tumor specific features with response to treatment 
and outcome in patients can define prognostic indicators that allow the design of 
tailored therapy based on the molecular profile of the tumor. These therapies include 
5 antibody targeting and gene therapy. Determining expression of certain polynucleotides 
and comparison of a patients profile with known expression in normal tissue and 
variants of the disease allows a determination of the best possible treatment for a 
patient, both in terms of specificity of treatment and in terms of comfort level of the 
patient. Surrogate tumor markers, such as polynucleotide expression, can also be used 

10 to better classify, and thus diagnose and treat, different forms and disease states of 
cancer. Two classifications widely used in oncology that can benefit from identification 
of the expression levels of the polynucleotides of the invention are staging of the 
cancerous disorder, and grading the nature of the cancerous tissue. 

The polynucleotides of the invention can be useful to monitor patients 

15 having or susceptible to cancer to detect potentially malignant events at a molecular 
level before they are detectable at a gross morphological level. Furthermore, a 
polynucleotide of the invention identified as important for one type of cancer can also 
have implications for development or risk of development of other types of cancer, e.g., 
where a polynucleotide is differentially expressed across various cancer types. Thus, 

20 for example, expression of a polynucleotide that has clinical implications for metastatic 
colon cancer can also have clinical implications for stomach cancer or endometrial 
cancer. 

Staging . Staging is a process used by physicians to describe how 
advanced the cancerous state is in a patient. Generally, if a cancer is only detectable in 

25 the area of the primary lesion without having spread to any lymph nodes it is called 
Stage L If it has spread only to the closest lymph nodes, it is called Stage II. In Stage 
III, the cancer has generally spread to the lymph nodes in near proximity to the site of 
the primary lesion. Cancers that have spread to a distant part of the body, such as the 
liver, bone, brain or other site, are Stage IV, the most advanced stage. 

30 The polynucleotides of the invention can facilitate fine-tuning of the 

staging process by identifying markers for the aggresivity of a cancer, e.g., the 
metastatic potential, as well as the presence in different areas of the body. Thus, a Stage 
II cancer with a polynucleotide signifying a high metastatic potential cancer can be used 
to change a borderline Stage II tumor to a Stage III tumor, justifying more aggressive 
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therapy. Conversely, the presence of a polynucleotide signifying a lower metastatic 
potential allows more conservative staging of a tumor. 

Grading of cancers . Grade is a term used to describe how closely a 
tumor resembles normal tissue of its same type. The microscopic appearance of a tumor 
5 is used to identify tumor grade based on parameters such as cell morphology, cellular 
organization, and other markers of differentiation. As a general rule, the grade of a 
tumor corresponds to its rate of growth or aggressiveness, with undifferentiated or high- 
grade tumors being more aggressive than well differentiated or low-grade tumors. The 
following guidelines are generally used for grading tumors: 1) GX Grade cannot be 

10 assessed; 2) Gl Well differentiated; G2 Moderately well differentiated; 3) G3 Poorly 
differentiated; 4) G4 Undifferentiated. The polynucleotides of the invention can be 
especially valuable in determining the grade of the tumor, as they not only can aid in 
determining the differentiation status of the cells of a tumor, they can also identify 
factors other than differentiation that are valuable in determining the aggressivity of a 

1 5 tumor, such as metastatic potential. 

Detection of lung cancer . The polynucleotides of the invention can be 
used to detect lung cancer in a subject. Although there are more than a dozen different 
kinds of lung cancer, the two main types of lung cancer are small cell and nonsmall cell, 
which encompass about 90% of all lung cancer cases. Small cell carcinoma (also called 

20 oat cell carcinoma) usually starts in one of the larger bronchial tubes, grows fairly 
rapidly, and is likely to be large by the time of diagnosis. Nonsmall cell lung cancer 
(NSCLC) is made up of three general subtypes of lung cancer. Epidermoid carcinoma 
(also called squamous cell carcinoma) usually stalls in out: uf llie la r ger bronchial tubes 
and grows relatively slowly. The size of these tumors can range from very small to 

25 quite large. Adenocarcinoma starts growing near the outside surface of the lung and can 
vary in both size and growth rate. Some slowly growing adenocarcinomas are described 
as alveolar cell cancer. Large cell carcinoma starts near the surface of the lung, grows 
rapidly, and the growth is usually fairly large when diagnosed. Other less common 
forms of lung cancer are carcinoid, cylindroma, mucoepidermoid, and malignant 

30 mesothelioma. 

The polynucleotides of the invention, e.g., polynucleotides differentially 
expressed in normal cells versus cancerous lung cells (e.g., tumor cells of high or low 
metastatic potential) or between types of cancerous lung cells (e.g., high metastatic 
versus low metastatic), can be used to distinguish types of lung cancer as well as 
35 identifying traits specific to a certain patient's cancer and selecting an appropriate* 
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therapy. For example, if the patient's biopsy expresses a polynucleotide that is 
associated with a low metastatic potential, it may justify leaving a larger portion of the 
patient's lung in surgery to remove the lesion. Alternatively, a smaller lesion with 
expression of a polynucleotide that is associated with high metastatic potential may 
5 justify a more radical removal of lung tissue and/or the surrounding lymph nodes, even 
if no metastasis can be identified through pathological examination. 

Detection of breast cancer . The majority of breast cancers are 
adenocarcinomas subtypes, which can be summarized as follows: 1) ductal carcinoma 
in situ (DOS), including comedocarcinoma; 2) infiltrating (or invasive) ductal 

10 carcinoma (IDC); 3) lobular carcinoma in situ (LCIS); 4) infiltrating (or invasive) 
lobular carcinoma (ILC); 5) inflammatory breast cancer; 6) medullary carcinoma; 
7) mucinous carcinoma; 8) Paget's disease of the nipple; 9) Phyllodes tumor; and 
10) tubular carcinoma. 

The expression of polynucleotides of the invention can be used in the 

15 diagnosis and management of breast cancer, as well as to distinguish between types of 
breast cancer. Detection of breast cancer can be determined using expression levels of 
any of the appropriate polynucleotides of the invention, either alone or in combination. 
Determination of the aggressive nature and/or the metastatic potential of a breast cancer 
can also be determined by comparing levels of one or more polynucleotides of the 

20 invention and comparing levels of another sequence known to vary in cancerous tissue, 
e.g., ER expression. In addition, development of breast cancer can be detected by 
examining the ratio of expression of a differentially expressed polynucleotide to the 
levels of steroid hormones (e.g., testosterone or estrogen) or to other hormones (e.g., 
growth hormone, insulin). Thus expression of specific marker polynucleotides can be 

25 used to discriminate between normal and cancerous breast tissue, to discriminate 
between breast cancers with different cells of origin, to discriminate between breast 
cancers with different potential metastatic rates, etc. 

Detection of colon cancer . The polynucleotides of the invention 
exhibiting the appropriate expression pattern can be used to detect colon cancer in a 

30 subject. Colorectal cancer is one of the most common neoplasms in humans and 
perhaps the most frequent form of hereditary neoplasia. Prevention and early detection 
are key factors in controlling and curing colorectal cancer. Colorectal cancer begins as 
polyps, which are small, benign growths of cells that form on the inner lining of the 
colon. Over a period of several years, some of these polyps accumulate additional 

35 mutations and become cancerous. Multiple familial colorectal cancer disorders have 
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been identified, which are summarized as follows: 1) Familial adenomatous polyposis 
(FAP); 2) Gardner's syndrome; 3) Hereditary nonpolyposis colon cancer (HNPCC); and 
4) Familial colorectal cancer in Ashkenaii Jews. The expression of appropriate 
polynucleotides of the invention can be used in the diagnosis, prognosis and 
5 management of colorectal cancer. Detection of colon cancer can be determined using 
expression levels of any of these sequences alone or in combination with the levels of 
expression. Determination of the aggressive nature and/or the metastatic potential of a 
colon cancer can be determined by comparing levels of one or more polynucleotides of 
the invention and comparing total levels ■ of another sequence known to vary in 

10 cancerous tissue, e.g., expression of p53, DCC ras, lor FAP (see, e.g., Fearon ER, et al. 5 
Cell (1990) 61(5):759; Hamilton SR et al., Cancer (1993) 72:957; Bodmer W, et al., 
Nat Genet. (1994) 4(3):217; Fearon ER, Ann N Y Acad Set (1995) 765:101). For 
example, development of colon cancer can be detected by examining the ratio of any of 
the polynucleotides of the invention to the levels of oncogenes (e.g., ras) or tumor 

15 suppressor genes (e.g., FAP or p53). Thus expression of specific marker 
polynucleotides can be used to discriminate between normal and cancerous colon tissue, 
to discriminate between colon cancers with different cells of origin, to discriminate 
between colon cancers with different potential metastatic rates, etc. 



Use of Polynucleotides to Screen for Peptide Analogs and Antagonists 
20 Polypeptides encoded by the instant polynucleotides and corresponding 

full length genes can be used to screen peptide libraries to identify binding partners, 
such as receptors, trom among the encoded polypeptides. — Peptide libiaiies can be 
synthesized according to methods known in the art (see, e.g., U.S. Patent No. 5,010,175, 
and WO 91/17823). Agonists or antagonists of the polypeptides if the invention can be 
25 screened using any available method known in the art, such as signal transduction, 
antibody binding, receptor binding, mitogenic assays, chemotaxis assays, etc. The 
assay conditions ideally should resemble the conditions under which the native activity 
is exhibited in vivo, that is, under physiologic pH, temperature, and ionic strength. 
Suitable agonists or antagonists will exhibit strong inhibition or enhancement of the 
30 native activity at concentrations that do not cause toxic side effects in the subject. 
Agonists or antagonists that compete for binding to the native polypeptide can require 
concentrations equal to or greater than the native concentration, while inhibitors capable 
of binding irreversibly to the polypeptide can be added in concentrations on the order of 
the native concentration. 
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Such screening and experimentation can lead to identification of a novel 
polypeptide binding partner, such as a receptor, encoded by a gene or a cDNA 
corresponding to a polynucleotide of the invention, and at least one peptide agonist or 
antagonist of the novel binding partner. Such agonists and antagonists can be used to 
5 modulate, enhance, or inhibit receptor function in cells to which the receptor is native, 
or in cells that possess the receptor as a result of genetic engineering. Further, if the 
novel receptor shares biologically important characteristics with a known receptor, 
information about agonist/antagonist binding can facilitate development of improved 
agonists/antagonists of the known receptor. 

10 Pharmaceutical Compositions and Therapeutic Uses 

Pharmaceutical compositions of the invention can comprise 
polypeptides, antibodies, or polynucleotides (including antisense nucleotides and 
ribozymes) of the claimed invention in a therapeutically effective amount. The term 
"therapeutically effective amount" as used herein refers to an amount of a therapeutic 

15 agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a 
detectable therapeutic or preventative effect. The effect can be detected by, for 
example, chemical markers or antigen levels. Therapeutic effects also include reduction 
in physical symptoms, such as decreased body temperature. The precise effective 
amount for a subject will depend upon the subject's size and health, the nature and 

20 extent of the condition, and the therapeutics or combination of therapeutics selected for 
administration. Thus, it is not useful to specify an exact effective amount in advance. 
However, the effective amount for a given situation is determined by routine 
experimentation and is within the judgment of the clinician. For purposes of the present 
invention, an effective dose will generally be from about 0.01 mg/ kg to 50 mg/kg or 

25 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is 
administered. 

A pharmaceutical composition can also contain a pharmaceutically 
acceptable carrier. The term "pharmaceutically acceptable carrier" refers to a carrier for 
administration of a therapeutic agent, such as antibodies or a polypeptide, genes, and 
30 other therapeutic agents. The term refers to any pharmaceutical carrier that does not 
itself induce the production of antibodies harmful to the individual receiving the 
composition, and which can be administered without undue toxicity. Suitable carriers 
can be large, slowly metabolized macromolecules such as proteins, polysaccharides, 
polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
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and inactive virus particles. Such carriers are well known to those of ordinary skill in 
the art. Pharmaceutically acceptable carriers in therapeutic compositions can include 
liquids such as water, saline, glycerol and ethanol. Auxiliary substances, such as 
wetting or emulsifying agents, pH buffering substances, and the like, can also be present 
5 in such vehicles. Typically, the therapeutic compositions are prepared as injectables, 
either as liquid solutions or suspensions; solid forms suitable for solution in, or 
suspension in, liquid vehicles prior to injection can also be prepared. Liposomes are 
included within the definition of a pharmaceutically acceptable carrier. 
Pharmaceutically acceptable salts can also be present in the pharmaceutical 

10 composition, e.g., mineral acid salts such as hydrochlorides, hydrobromides, 
phosphates, sulfates, and the like; and the salts of organic acids such as acetates, 
propionates, malonates, benzoates, and the like. A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical 
Sciences (Mack Pub. Co., New Jersey, 1991). 

15 Delivery Methods . Once formulated, the compositions of the invention 

can be (1) administered directly to the subject (e.g., as polynucleotide or polypeptides); 
or (2) delivered ex vivo, to cells derived from the subject (e.g., as in ex vivo gene 
therapy). Direct delivery of the compositions will generally be accomplished by 
parenteral injection, e.g., subcutaneously, intraperitoneal^, intravenously or 

20 intramuscularly, intratumoral or to the interstitial space of a tissue. Other modes of 
administration include oral and pulmonary administration, suppositories, and 
transdermal applications, needles, and gene guns or hyposprays. Dosage treatment can 

be a single dose schedule or a multiple dose s chedule, . 



25 into a subject are known in the art and described in e.g., International Publication No. 
WO 93/14778. Examples of cells useful in ex vivo applications include, for example, 
stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic cells, or 
tumor cells. Generally, delivery of nucleic acids for both ex vivo and in vitro 
applications can be accomplished by, for example, dextran-mediated transfection, 

30 calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, 
electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
microinjection of the DNA into nuclei, all well known in the art. 



found to correlate with a proliferative disorder, such as neoplasia, dysplasia, and 
35 hyperplasia, the disorder can be amenable to treatment by administration of a 



Methods for the ex vivo delivery and reimplantation of transformed cells 



Once a gene corresponding to a polynucleotide of the invention has been 
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therapeutic agent based on the provided polynucleotide, corresponding polypeptide or 
other corresponding molecule (e.g., antisense, ribozyme, etc.). 



pharmaceutical compositions are determined based on the specific qualities of the 
5 therapeutic composition, the condition, age, and weight of the patient, the progression 
of the disease, and other relevant factors. For example, administration of 
polynucleotide therapeutic compositions agents of the invention includes local or 
systemic administration, including injection, oral administration, particle gun or 
catheterized administration, and topical administration. Preferably, the therapeutic 

10 polynucleotide composition contains an expression construct comprising a promoter 
operably linked to a polynucleotide of at least 12, 22, 25, 30, or 35 contiguous nt of the 
polynucleotide disclosed herein. Various methods can be used to administer the 
therapeutic composition directly to a specific site in the body. For example, a small 
metastatic lesion is located and the therapeutic composition injected several times in 

15 several different locations within the body of tumor. Alternatively, arteries which serve 
a tumor are identified, and the therapeutic composition injected into such an artery, in 
order to deliver the composition directly into the tumor. A tumor that has a necrotic 
center is aspirated and the composition injected directly into the now empty center of 
the tumor. The antisense composition is directly administered to the surface of the 

20 tumor, for example, by topical application of the composition. X-ray imaging is used to 
assist in certain of the above delivery methods. 



containing an antisense polynucleotide, subgenomic polynucleotides, or antibodies to 
specific tissues can also be used. Receptor-mediated DNA delivery techniques are 

25 described in, for example, Findeis et al., Trends Biotechnoi (1993) 11:202; Chiou et al., 
Gene Therapeutics: Methods And Applications Of Direct Gene Transfer (J.A. Wolff, 
ed.) (1994); Wu et al., J. Biol. Chem. (1988) 263:621; Wu et al., J. Biol Chem. (1994) 
269:542; Zenke et al., Proc. Natl. Acad Sci. (USA) (1990) 57:3655; Wu et al., J. Biol. 
Chem. (1991) 266:338. Therapeutic compositions containing a polynucleotide are 

30 administered in a range of about 100 ng to about 200 mg of DNA for local 
administration in a gene therapy protocol. Concentration ranges of about 500 ng to 
about 50 mg, about 1 mg to about 2 mg, about 5 mg to about 500 mg 5 and about 20 mg 
to about 100 mg of DNA can also be used during a gene therapy protocol. Factors such 
as method of action (e.g., for enhancing or inhibiting levels of the encoded gene 

35 product) and efficacy of transformation and expression are considerations which will 



The dose and the means of administration of the inventive 



Receptor-mediated targeted delivery of therapeutic compositions 
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affect the dosage required for ultimate efficacy of the antisense subgenomic 
polynucleotides. Where greater expression is desired over a larger area of tissue, larger 
amounts of antisense subgenomic polynucleotides or the same amounts readministered 
in a successive protocol of administrations, or several administrations to different 
5 adjacent or close tissue portions of, for example, a tumor site, may be required to effect 
a positive therapeutic outcome. In all cases, routine experimentation in clinical trials 
will determine specific ranges for optimal therapeutic effect. For polynucleotide-related 
genes encoding polypeptides or proteins with anti-inflammatory activity, suitable use, 
doses, and administration are described in U.S. Patent No. 5,654,1 73. 

10 The therapeutic polynucleotides and polypeptides of the present 

invention can be delivered using gene delivery vehicles. The gene delivery vehicle can 
be of viral or non-viral origin (see generally, Jolly, Cancer Gene Therapy (1994) 7:51; 
Kimura, Human Gene Therapy (1994) 5:845; Connelly, Human Gene Therapy (1995) 
7:185; and Kaplitt, Nature Genetics (1994) 6:148). Expression of such coding 

15 sequences can be induced using endogenous mammalian or heterologous promoters. 
Expression of the coding sequence can be either constitutive or regulated. 

Viral-based vectors for delivery of a desired polynucleotide and 
expression in a desired cell are well known in the art. Exemplary viral-based vehicles 
include, but are not limited to, recombinant retroviruses, (see, e.g., WO 90/07936; WO 

20 94/03622; WO 93/25698; WO 93/25234; U.S. Patent No. 5, 219,740; WO 93/11230; 
WO 93/10218; U.S. Patent No. 4,777,127; GB Patent No. 2,200,651; EP 0 345 242; and 
WO 91/02805), alphavirus-based vectors {e.g., Sindbis virus vectors, Semliki forest 

v ' n|f * (atpp yp 67- ATCT VPl 1217), Kozz Ri ver vir u s (AT rr V^-^i\ ATCC VR- 

1246) and Venezuelan equine encephalitis virus (ATCC VR-923; ATCC VR-1250; 

25 ATCC VR 1249; ATCC VR-532), and adeno-associated virus (AAV) vectors (see, e.g., 
WO 94/12649, WO 93/03769; WO 93/19191; WO 94/28938; WO 95/1 1984 and WO 
95/00655). Administration of DNA linked to killed adenovirus as described in Curiel, 
Hum. Gene Ther. (1992) 3:147 can also be employed. 

Non-viral delivery vehicles and methods can also be employed, 

30 including, but not limited to, polycationic condensed DNA linked or unlinked to killed 
adenovirus alone (see, e.g., Curiel, Hum. Gene Ther. (1992) i:147); ligand-linked 
DNA(see, e.g., Wu, J. Biol. Chem. 264:16985 (1989)); eukaryotic cell delivery vehicles 
cells (see, e.g., U.S. Patent No. 5,814,482; WO 95/07994; WO 96/17072; 
WO 95/30763; and WO 97/42338) and nucleic charge neutralization or fusion with cell 

35 membranes. Naked DNA can also be employed. Exemplary naked DNA introduction* 
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methods are described in WO 90/11 092 and U.S. Patent No. 5,580,859. Liposomes that 
can act as gene delivery vehicles are described in U.S. Patent No. 5,422,120; WO 
95/13796; WO 94/23697; WO 91/14445; and EP 0524968. Additional approaches are 
described in Philip, Mol. Cell Biol. 7*2411 (1994), and in Woffendin, Proc. Natl 
5 AcadSci. (1994)97:1581. 

Further non-viral delivery suitable for use includes mechanical delivery 
systems such as the approach described in Woffendin et al., Proc. Natl Acad ScL USA 
P7(24):11581 (1994). Moreover, the coding sequence and the product of expression of 
such can be delivered through deposition of photopolymerized hydrogel materials or 

10 use of ionizing radiation (see, e.g., U.S. Patent No. 5,206,152 and WO 92/11033). 
Other conventional methods for gene delivery that can be used for delivery of the 
coding sequence include, for example, use of hand-held gene transfer particle gun (see, 
e.g., U.S. Patent No. 5,149,655); use of ionizing radiation for activating transferred gene 
(see, e.g., U.S. Patent No. 5,206,152 and WO 92/1 1033). 

15 The present invention will now be illustrated by reference to the 

following examples which set forth particularly advantageous embodiments. However, 
it should be noted that these embodiments are illustrative and are not to be construed as 
restricting the invention in any way. 
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i 

EXAMPLES 
EXAMPLE 1 

Source of Biological Materials and Overview of Novel Polynucleotides 

Expressed by the Biological Materials 

5 

Cell lines and human normal and tumor tissue were used to construct 
cDNA libraries from mRNA isolated from the cells and tissues. Most sequences were 
about 275-300 nucleotides in length. The cells lines include Kml2L4-A cell line, a 
high metastatic colon cancer cell line (Morika, W. A. K. et al., Cancer Research (1988) 

10 45:6863). The KM12L4-A cell line is derived from the KM12C cell line. The KM12C 
cell line, which is poorly metastatic (low metastatic) was established in culture from a 
Dukes 5 stage B2 surgical specimen (Morikawa et al. Cancer Res. (1988) 45:6863). The 
KML4-A is a highly metastatic subline derived from KM12C (Yeatman et al. Nuci 
Acids, Res. (1995) 23:4007; Bao-Ling et al. Proc. Annu. Meet. Am. Assoc. Cancer. Res. 

15 (1995) 27:3269). The KM12C and KM12C-derived cell lines (e.g., KM12L4, 
KM12L4-A, etc.) are well-recognized in the art as model cell lines for the study of 
colon cancer (see, e.g., Moriakawa et al., supra; Radinsky et al. Clin. Cancer Res. 
(1995) 1:19; Yeatman et al., (1995) supra; Yeatman et al., Clin. Exp. Metastasis (1996) 
14:246). These and other cell lines and tissue are described in Table 6. 

20 The sequences of the isolated polynucleotides were first masked to 

eliminate low complexity sequences using the XBLAST masking program (Claverie 

"Effective Large-Scale Sequence Similarity Search e s," In: — Computer Methods -^fr - 

Macromolecular Sequence Analysis. Doolittle, ed., Meth. Enzymol. 266:212-227 
Academic Press, NY, NY (1996); see particularly Claverie, in "Automated DNA 

25 Sequencing and Analysis Techniques" Adams et al., eds., Chap. 36, p. 267 Academic 
Press, San Diego, 1994 and Claverie et al. Comput. Chem. (1993) 77:191 ). Generally, 
masking does not influence the final search results, except to eliminate sequences of 
relative little interest due to their low complexity, and to eliminate multiple "hits" based 
on similarity to repetitive regions common to multiple sequences, e.g., Alu repeats. The 

30 sequences remaining after masking were theiii used in a BLASTN vs. Genbank search; 
sequences that exhibited greater than 70% overlap, 99% identity, and a p value of less 
than 1 x 10^ u were discarded. Sequences from this search also were discarded if the 
inclusive parameters were met, but the sequence was ribosomal or vector-derived. 

<6f 
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The resulting sequences from the previous search were classified into 
three groups (1,2 and 3 below) and searched in a BLASTX vs. NRP (non-redundant 
proteins) database search: (1) unknown (no hits in the Genbank search), (2) weak 
similarity (greater than 45% identity and p value of less than 1 x 10~ 5 ), and (3) high 
5 similarity (greater than 60% overlap, greater than 80% identity, and p value less than 1 
x 10" 5 ). Sequences having greater than 70% overlap, greater than 99% identity, and p 
value of less than 1 x 10" 40 were discarded. 

The remaining sequences were classified as unknown (no hits), weak 
similarity, and high similarity (parameters as above). Two searches were performed on 

10 these sequences. First, a BLAST vs. EST database search was performed and 
sequences with greater than 99% overlap, greater than 99% similarity and a p value of 
less than 1 x 10" 40 were discarded. Sequences with a p value of less than 1 x 10" 65 when 
compared to a database sequence of human origin were also excluded. Second, a 
BLASTN vs. Patent GeneSeq database was performed and sequences having greater 

15 than 99% identity, p value less than 1 x lO' 40 , and greater than 99% overlap were 
discarded. 

The remaining sequences were subjected to screening using other rules 
and redundancies in the dataset. Sequences with a p value of less than 1 x 10~ m in 
relation to a database sequence of human origin were specifically excluded. The final 

20 result provided the 3351 sequences listed in the accompanying Sequence Listing. Each 
identified polynucleotide represents sequence from at least a partial mRNA transcript. 
Polynucleotides that were determined to be novel were assigned a sequence 
identification number. 

The novel polynucleotides were assigned sequence identification numbers 

25 SEQ ID NOs: 1-3351 . The first 1847 DNA sequences corresponding to the novel 
polynucleotides are provided in the Sequence Listing in Table 1. DNA sequences 
corresponding to the novel polynucleotides of SEQ ID NOs: 1848-3351 are provided in the 
Sequence Listing in Table 2. The DNA sequences of Table 2, while numbered SEQ ID 1- 
1504, correspond to SEQ ID NOs: 1848-3351 in the Sequence Listing, e.g., Table 2 SEQ ID 

30 1 is SEQ ID NO: 1 848, Table 2 SEQ ID 2 is SEQ ID NO: 1 849, etc. Each DNA sequence in 
Table 4 is uniquely identified by a number that is 1847 less than its SEQ ID NO in the 
Sequence Listing. Tables 1 and 2 provide: 1) the SEQ ID NO assigned to each sequence 
for use in the present specification or a corresponding number; 2) the sequence name used 
as an internal identifier of the sequence; 3) the name assigned to the clone from which the 
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sequence was isolated; and 4) the number of the cluster to which the sequence is assigned 
(Cluster ID; where the cluster ID is 0, the sequence was not assigned to any cluster). 

Because the provided polynucleotides represent partial mRNA 
transcripts, two or more polynucleotides of the invention may represent different 
5 regions of the same mRNA transcript and the same gene. Thus, if two or more SEQ ID 
NOs: are identified as belonging to the same clone, then either sequence can be used to 
obtain the full-length mRNA or gene. 

EXAMPLE 2 

Results of Public Database Search to Identify Function of Gene Products 

10 

SEQ ID NOs: 1-3351 were translated in all three reading frames to 
determine the best alignment with the individual sequences. These amino acid 
sequences and nucleotide sequences are referred to, generally, as query sequences, 
which are aligned with the individual sequences. Query and individual sequences were 

15 aligned using the BLAST programs, available over the world wide web at 
http://www.ncbi.nlm.nih.gov/BLAST/. Again the sequences were masked to various 
extents to prevent searching of repetitive sequences or poly-A sequences, using the 
XBLAST program for masking low complexity as described above in Example 1. 

Tables 3 and 4 (inserted before the claims) show the results of the 

20 alignments. Table 3 contains alignment information for SEQ ID NOs: 1-1847 and Table 4 
contains alignment information for SEQ ID NOs: 1848-335 1. The DNA sequences of Table 

1, wh i le numbered SEQ ID l 1501, correspond to SEQ ID NOs; 18 4 8 - 335 1 1 Each DNA 

sequence in Table 4 is uniquely identified by a number that is 1847 less than its SEQ ID 
NO. Tables 3 and 4 refer to each sequence by its SEQ ID NO or a corresponding number, 

25 the accession numbers and descriptions of nearest neighbors from the Genbank and Non- 
Redundant Protein searches, and the p values of the search results. 

For each of SEQ ID NOs:l-1847, the best alignment to a protein or DNA 
sequence is included in Table 3, and the best alignment for each of SEQ ID NOs: 1848- 
3351 is included in Table 4. The activity of the polypeptide encoded by SEQ ID 

30 NOs: 1-3351 is the same or similar to the nearest neighbor reported in Table 3 or 4. The 
accession number of the nearest neighbor is reported, providing a reference to the activities 
exhibited by the nearest neighbor. The search program and database used for the alignment 
also are indicated as well as a calculation of the p value. 
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Full length sequences or fragments of the polynucleotide sequences of 
the nearest neighbors can be used as probes and primers to identify and isolate the full 
length sequence of SEQ ID NOs: 1-3351. The nearest neighbors can indicate a tissue or 
cell type to be used to construct a library for the full-length sequences of SEQ ID 
5 NOs:l-3351. 

EXAMPLE 3 
Members of Protein Families 

The sequences (SEQ ID NOs: 1-3351) were used to conduct a profile 
10 search as described in the specification above. Several of the polynucleotides of the 
invention were found to encode polypeptides having characteristics of a polypeptide 
belonging to a known protein families (and thus represent new members of these 
protein families) and/or comprising a known functional domain (Table 5). "Start" and 
"stop" in Table 3 indicate the position within the individual sequences that align with 
15 the query sequence having the indicated SEQ ID NO. The direction indicates the 
orientation of the query sequence with respect to the individual sequence, where 
forward (for) indicates that the alignment is in the same direction (left to right) as the 
sequence provided in the Sequence Listing and reverse (rev) indicates that the 
alignment is with a sequence complementary to the sequence provided in the Sequence 
20 Listing. 

Some polynucleotides exhibited multiple profile hits because, for 
example, the particular sequence contains overlapping profile regions, and/or the 
sequence contains two different functional domains. These profile hits are described in 
more detail below. 

25 Ank Repeats TANK) . SEQ ID NOs:187, 1268, 1804, 1819, 1830, 1839, 

2652, 3015 and 3267 represent polynucleotides encoding an Ank repeat-containing 
protein. The ankyrin motif is a 33 amino acid sequence named for the protein ankyrin 
which has 24 tandem 33-amino-acid motifs. Ank repeats were originally identified in 
the cell-cycle-control protein cdclO (Breeden et al., Nature (1987) 529:651). Proteins 

30 containing ankyrin repeats include ankyrin, myotropin, I-kappaB proteins, cell cycle 
protein cdclO, the Notch receptor (Matsuno et al., Development (1997) I24(21):4265); 
G9a (or BAT8) of the class III region of the major histocompatibility complex 
(Biochem J. 290:811-818, 1993), FABP, GABP, 53BP2, Linl2, glp-1, SW14, and 
SW16. The functions of the ankyrin repeats are compatible with a role in protein- 

si 
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protein interactions (Bork, Proteins (1993) 77(4):363; Lambert and Bennet, Eur. J. 
Biochem. (1993) 277:1; Kerr et al., Current Op Cell Bioi (1992) 4:496; Bennet et al., 
J. Bioi Chem. (1980) 255:6424). 

ATPases Associated with Various Cellular Activities ( ATPases V 
5 Sequences within SEQ ID NOs:431, 639, 2135, 2684, 2859, 3197 and 3266 correspond 
to a sequence that encodes a novel member of the "ATPases Associated with diverse 
cellular Activities" (AAA) protein family. The AAA protein family is composed of a 
large number of ATPases that share a conserved region of about 220 amino acids that 
contains an ATP-binding site (Froehlich et al., J. Cell Biol. (1991) 774:443; Erdmann et 

10 al., Cell (1991) 64:499; Peters et al., EMBO J. (1990) P:1757; Kunau et ah, Biochimie 
(1993) 75:209-224; Confalonieri et al., BioEssays (1995) 77:639; 
http://yeamob.pci.chemie.uni-tuebingen.de/AAA/Description.html). The proteins that 
belong to this family either contain one or two AAA domains. In general, the AAA 
domains in these proteins act as ATP-dependent protein clamps (Confalonieri et al. 

15 (1995) BioEssays 77:639). In addition to the ATP-binding 'A' and 'B' motifs, which are 
located in the N-terminal half of this domain, there is a highly conserved region located 
in the central part of the domain which was used in the development of the signature 
pattern. The consensus pattern is: [LIVMT]-x-[LIVMT]-[LIVMF]-x-[GATMC]-[ST]- 
[NS]-x(4)-[LIVM]-D-x-A-[LIFA]-x-R. 

20 Bromodomain (bromodomainV SEQ ID NO: 1814 represents a 

polynucleotide encoding a polypeptide having a bromodomain region (Haynes et al., 
1992, Nucleic Acids Res. 20:2693-2603, Tamkun et al., 1992, Cell 68:561-572, and 
Tarflkim, 1995, Cu rr . Opin. Genet. Dev. 5.473-477), whiUi is a com ci ved r egion of 
about 70 amino acids. The bromodomain is thought to be involved in protein-protein 

25 interactions and may be important for the assembly or activity of multicomponent 
complexes involved in transcriptional activation. The consensus pattern, which spans a 
major part of the bromodomain, is: [STANVF]-x(2)-F-x(4)-[DNS]-x(5,7)-[DENQTF]- 
Y-[HFY]-x(2)- [LIVMFY]-x(3)-[LIVM]-x(4)-[L^^ 
N-[SACF]-x(2)-[FY]. 

30 Basic Region Plus Leucine Zipper Transcription Factors (BZIPV SEQ 

ID NOs:410, 552, 768, 822, 836, 1288, 1365, 1454, 1540, 1549, 1556, 1557, 1563, 
1622, 1630, 1704, 1808, 2363, 2424, 3147, 3152, 3158 and 3208 represent 
polynucleotides encoding a novel member of the family of basic region plus leucine 
zipper transcription factors. The bZIP superfamily (Hurst, Protein Prof (1995) 2:105; 

35 and Ellenberger, Curr. Opin. Struct. Biol. (1994) 4:12) of eukaryotic DNA-binding> 
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transcription factors encompasses proteins that contain a basic region mediating 
sequence-specific DNA-binding followed by a leucine zipper required for dimerization. 
The consensus pattern for this protein family is: [KR]-x(l,3)-[RKSAQ]-N-x(2)- 
[SAQ](2)-x-[RKTAENQ]-x-R-x-[RK]. 
5 EF Hand (EFhand) . SEQ ID NOs:820, 1755 and 3285 correspond to 

polynucleotides encoding a novel protein in the family of EF-hand proteins. Many 
calcium-binding proteins belong to the same evolutionary family and share a type of 
calcium-binding domain known as the EF-hand (Kawasaki et al., Protein. Prof (1995) 
2:305-490). This type of domain consists of a twelve residue loop flanked on both sides 

10 by a twelve residue alpha-helical domain. In an EF-hand loop the calcium ion is 
coordinated in a pentagonal bipyramidal configuration. The six residues involved in the 
binding are in positions 1, 3, 5, 7, 9 and 12; these residues are denoted by X 5 Y, Z, -Y, 
-X and -Z. The invariant Glu or Asp at position 1 2 provides two oxygens for liganding 
Ca (bidentate ligand). The consensus pattern includes the complete EF-hand loop as 

15 well as the first residue which follows the loop and which seem to always be 
hydrophobic: D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]- 
[DENQSTAGC]-x(2)-[DE]-[LIVMFYW]. 

Ets Domain (Ets Nterm) . SEQ ID NO: 1811 represents a polynucleotide 
encoding a polypeptide with N-terminal homology in ETS domain. Proteins of this 

20 family contain a conserved domain, the "ETS-domain," that is involved in DNA 
binding. The domain appears to recognize purine-rich sequences; it is about 85 to 90 
amino acids in length, and is rich in aromatic and positively charged residues (Wasylyk, 
et al., Eur. J. Biochem. (1993) 277:718). The ets gene family encodes a novel class of 
DNA-binding proteins, each of which binds a specific DNA sequence and comprises an 

25 ets domain that specifically interacts with sequences containing the common core tri- 
nucleotide sequence GGA. In addition to an ets domain, native ets proteins comprise 
other sequences which can modulate the biological specificity of the protein. Ets genes 
and proteins are involved in a variety of essential biological processes including cell 
growth, differentiation and development, and three members are implicated in 

30 oncogenic process. 

G-Protein Alpha Subunit fG-alpha) . SEQ ID NO: 1846 represents a 
polynucleotide encoding a novel polypeptide of the G-protein alpha subunit family. 
Guanine nucleotide binding proteins (G-proteins) are a family of membrane-associated 
proteins that couple extracellularly-activated integral-membrane receptors to 

35 intracellular effectors, such as ion channels and enzymes that vary the concentration of 
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second messenger molecules. G-proteins are composed of 3 subunits (alpha, beta and 
gamma) which, in the resting state, associate as a trimer at the inner face of the plasma 
membrane. The alpha subunit binds GTP and : exhibits GTPase activity. G-protein alpha 
subunits are 350-400 amino acids in length and have molecular weights in the range 40- 
5 45 kDa. Seventeen distinct types of alpha subunit have been identified in mammals, 
and fall into 4 main groups on the basis of both sequence similarity and function: alpha- 
s, alpha-q, alpha-i and alpha-12 (Simon et al., Science (1993) 252:802). They are often 
N-terminally acylated, usually with myristate and/or palmitoylate, and these fatty acid 
modifications can be important for membrane association and high- affinity interactions 

1 0 with other proteins. 

Helicases conserved C-terminal domain fhelicase _CY SEQ ID 
NOs:1496, 2826 and 2871 represent polynucleotides encoding novel members of the 
DEAD/H helicase family. A number of eukaryotic and prokaryotic proteins have been 
characterized (Schmid S.R., et al., MoL Microbiol. (1992) 6:283; Under P., et al., 

15 Nature (1989) 557:121; Wassarman D.A., et al., Nature (1991) 549:463) on the basis of 
their structural similarity. All are involved in ATP -dependent, nucleic-acid unwinding. 
All DEAD box family members of the above proteins share a number of conserved 
sequence motifs, some of which are specific to the DEAD family while others are 
shared by other ATP-binding proteins or by proteins belonging to the helicases 

20 'superfamily' (Hodgman T.C., Nature (1988) 333:22 and Nature (1988) 333:578 
(Errata). One of these motifs, called the "D-E-A-D-box", represents a special version of 
the B motif of ATP-binding proteins. Some other proteins belong to a subfamily which 

(Wassarman D.A., et al., Nature (1991) 549:463; Harosh I., et al., Nucleic Acids Res. 
25 (1991) 79:6331; Koonin E.V. et al., J. Gen. Virol. (1992) 75:989. The following 
signature patterns are used to identify members of both subfamilies: 1) [LIVMF](2)-D- 
E-A-D-[RKEN]-x-[LIVMFYGSTN]; and 2) [GSAH]-x-[LIVMF](3)-D-E-[ALIV]-H- 
[NECR]. 

Homeobox domain (homeobox) . SEQ ID NOs:1676, 1820 and 1821 
30 represent polynucleotides encoding proteins having a homeobox domain. The 
homeobox is a protein domain of 60 amino acids (Gehring In: Guidebook to the 
Homeobox Genes . Duboule D., Ed., pp. 1-10, Oxford University Press, Oxford, (1994); 
Buerglin In: Guidebook to the Homeobox Genes . pp25-72, Oxford University Press, 
Oxford, (1994); Gehring, Trends Biochem. Sci. (1992) 17:277-280; Gehring et al., 
35 Annu. Rev. Genet. (1986) 20:147-173; Schofield, Trends Neurosci. (1987) 70:3-6) first 
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j identified in a number of Drosophila homeotic and segmentation proteins. It is 

| extremely well conserved in many other animals, including vertebrates. This domain 

binds DNA through a helix-turn-helix type of structure. Several proteins that contain a 
homeobox domain play an important role in development. Most of these proteins are 
5 sequence-specific DNA-binding transcription factors. The homeobox domain is also 
very similar to a region of the yeast mating type proteins. These are sequence-specific 
DNA-binding proteins that act as master switches in yeast differentiation by controlling 
gene expression in a cell type-specific fashion. 

A schematic representation of the homeobox domain is shown below. 
10 The helix-turn-helix region is shown by the symbols TT (for helix), and T (for turn). 

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxHHHHHHHHtttHHHHHHHHHxxxxxxxxxx 
1 60 

15 The pattern detects homeobox sequences 24 residues long and spans 

positions 34 to 57 of the homeobox domain. The consensus pattern is as follows: 
[LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LIVM]-x(4)-[LIV]-[RXNQESTAlY] 
[LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]-x(5)-[RKNAIMW]. 

MAP kinase kinase (mkk) . SEQ ID NOs:29, 31, 196, 3175, 3190 and 

20 3281 represent novel members of the MAP kinase kinase family. MAP kinases 
(MAPK) are involved in signal transduction, and are important in cell cycle and cell 
growth controls. The MAP kinase kinases (MAPKK) are dual-specificity protein 
kinases which phosphorylate and activate MAP kinases. MAPKK homologues have 
been found in yeast, invertebrates, amphibians, and mammals. Moreover, the 

25 MAPKK/MAPK phosphorylation switch constitutes a basic module activated in distinct 
pathways in yeast and in vertebrates. MAPKKs are essential transducers through which 
signals must pass before reaching the nucleus. For review, see, e.g., Biologique Biol 
Cell (1993) 79:193-207; Nishida et al., Trends Biochem Sci (1993) 75:128-31; 
Ruderman, Curr Opin Cell Biol (1993) 5:207-13; Dhanasekaran et al., Oncogene (1998) 

30 7 7:1447-55; Kieferetal., Biochem Soc Trans (1997) 25:491-8; and Hill, Cell Signal 
(1996) 5:533-44. 

Protein Kinase (protkinaseY SEQ ID NOs:l 157, 1478, 1496, 2286, 2969 
j and 3190 represent polynucleotides encoding protein kinases. Protein kinases catalyze 

phosphorylation of proteins in a variety of pathways, and are implicated in cancer. 
35 Eukaryotic protein kinases (Hanks S.K., et al., FASEBJ. (1995) 9:576; Hunter T., Metk 

i 

j Enzymol. (1991) 200:3; Hanks S.K., et al., Metk Enzymol (1991) 200:38; Hanks S.K, 
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Curr. Opin. Struct. Biol. (1991) 7:369; Hanks S.K. et al., Science (1988) 241:42) are 
enzymes that belong to a very extensive family of proteins which share a conserved 
catalytic core common to both serine/threonine and tyrosine protein kinases. There are 
a number of conserved regions in the catalytic domain of protein kinases. The first 
5 region, which is located in the N-terminal extremity of the catalytic domain, is a 
glycine-rich stretch of residues in the vicinity of a lysine residue, which has been shown 
to be involved in ATP binding. The second region, which is located in the central part 
of the catalytic domain, contains a conserved aspartic acid residue which is important 
for the catalytic activity of the enzyme (Knighton D.R. et al., Science (1991) 255:407). 

10 The protein kinase profile includes two signature patterns for this second region: one 
specific for serine/threonine kinases and the other for tyrosine kinases. A third profile 
is based on the alignment in (Hanks S.K. et al., FASEB J. (1995) 9:576) and covers the 
entire catalytic domain. 

The consensus patterns are as follows: 1) [LIV]-G-{P}-G-{P}- 

15 [FYWMGSTNH]-[SGA]-{PW}-[LIVCAT]-{PD}-x-[GSTACLIVMFY]-x(5,18)- 

[LIVMFYWCSTAR]-[AIVP]-[LIVMFAGCKR]-K, where K binds ATP; 2) 
[LIVMFYC]-x-[HY]-x-D-[LIVMFY]-K-x(2)-N-[LIVMFYCT](3), where D is an active 
site residue; and 3) [LIVMFYC]-x-[HY]-x-D-[LIVMFY]-[RSTAC]-x(2)-N- 
[LIVMFYC], where D is an active site residue. 

20 If a protein analyzed includes two of the above protein kinase signatures, 

the probability of it being a protein kinase is close to 1 00%. 

Ras family proteins (rasY SEQ ID NOs:1688 and 3258 represent 

p r >l ynH™ti(J™ on^^ing n^V^ 1 mpmhprt nf thf> rac family rtf ^malt OTP/riPP-hinding 

proteins (Valencia et al., 1991, Biochemistry 30:4637-4648). Ras family members 
25 generally require a specific guanine nucleotide exchange factor (GEF) and a specific 
GTPase activating protein (GAP) as stimulators of overall GTPase activity. Among 
ras-related proteins, the highest degree of sequence conservation is found in four 
regions that are directly involved in guanine nucleotide binding. The first two 
constitute most of the phosphate and Mg2+ binding site (PM site) and are located in the 
30 first half of the G-domain. The other two regions are involved in guanosine binding and 
are located in the C-terminal half of the molecule. Motifs and conserved structural 
features of the ras-related proteins are described in Valencia et ah, 1991, Biochemistry 
30:4637-4648. A major consensus pattern of ras proteins is: D-T-A-G-Q-E-K-[LF]-G- 
G-L-R-[DE]-G-Y-Y. 
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Thioredoxin family active site (Thioredox) . SEQ ID NO: 1 677 represents 
a polynucleotide encoding a protein having a thioredoxin family active site. 
Thioredoxins (Holmgren A., Annu. Rev. Biochem. (1985) 54:237; Gleason F.K. et al., 
FEMS Microbiol Rev. (1988) 54:271; Holmgren, A J. Biol Chem. (1989) 2(54:13963; 
5 Eklund H. et al., Proteins (1991) 77:13) are small proteins of approximately one 
hundred amino- acid residues which participate in various redox reactions via the 
reversible oxidation of an active center disulfide bond. They exist in either a reduced 
form or an oxidized form where the two cysteine residues are linked in an 
intramolecular disulfide bond. Thioredoxin is present in prokaryotes and eukaryotes 
10 and the sequence around the redox-active disulfide bond is well conserved. All PDI 
contains two or three (ERp72) copies of the thioredoxin domain. The consensus pattern 
is: [LIVMF]-[LIVMSTA]-x-[LIVMFYC]-[FYWSTHE]-x(2)-[FYWGTN]-C- 
[GATPLVE]-[PHYWSTA]-C-x(6)-[LIVMFYWT] (where the two C's form the redox- 
active bond). 

15 Trypsin (trypsin) . SEQ ID NO: 1410 corresponds to a novel serine 

protease of the trypsin family. The catalytic activity of the serine proteases from the 
trypsin family is provided by a charge relay system involving an aspartic acid residue 
hydrogen-bonded to a histidine, which itself is hydrogen-bonded to a serine. The 
sequences in the vicinity of the active site serine and histidine residues are well 

20 conserved in this family of proteases (Brenner S., Nature (1988) 334:52%). The 
consensus patterns for this trypsin protein family are: 1) [LIVM]-[ST]-A-[STAG]-H-C, 
where H is the active site residue; and 2) [DNSTAGC]-[GSTAPIMVQH]-x(2)-G-[DE]- 
S-G-[GS]-[SAPHV]- [LIVMFYWH]-[LIVMFYSTANQH], where S is the active site 
residue. All sequences known to belong to this family are detected by the above 

25 consensus sequences, except for 18 different proteases which have lost the first 
conserved glycine. If a protein includes both the serine and the histidine active site 
signatures, the probability of it being a trypsin family serine protease is 100%. 

WD Domain. G-Beta Repeats fWD domain) . SEQ ID NOs: 1336, 1380, 
1711, 1762, 1909, 2218, 3047, 3108 and 3292 represent novel members of the WD 

30 domain/G-beta repeat family. Beta-transducin (G-beta) is one of the three subunits 
(alpha, beta, and gamma) of the guanine nucleotide-binding proteins (G proteins) which 
act as intermediaries in the transduction of signals generated by transmembrane 
receptors (Gilman, Annu. Rev. Biochem. (1987) 56:615). The alpha subunit binds to 
and hydrolyzes GTP; the functions of the beta and gamma subunits are less clear but 

35 they seem to be required for the replacement of GDP by GTP as well as for membrane 
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anchoring and receptor recognition. In higher eukaryotes, G-beta exists as a small 
multigene family of highly conserved proteins of about 340 amino acid residues. 
Structurally, G-beta consists of eight tandem repeats of about 40 residues, each 
containing a central Trp-Asp motif (this type of repeat is sometimes called a WD-40 
5 repeat). The consensus pattern for the WD domain/G-Beta repeat family is: 
[LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x(2)-[DN]-x(2)- 
[LIVMWSTAC]-x-[LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN]. 

wnt Family of Developmental Signaling Proteins ( Wnt dev sipn) . SEQ 
ID NO: 1538 corresponds to a novel member of the wnt family of developmental 

10 signaling proteins. Wnt-1 (previously known as int-1), the seminal member of this 
family, (Nusse R., Trends Genet. (1988) 4:291) is thought to play a role in intercellular 
communication and seems to be a signalling molecule important in the development of 
the central nervous system (CNS). All wnt family proteins share the following features 
characteristics of secretory proteins: a signal peptide, several potential N-glycosylation 

1 5 sites and 22 conserved cysteines that are probably involved in disulfide bonds. The 
Wnt proteins seem to adhere to the plasma membrane of the secreting cells and are 
therefore likely to signal over only few cell diameters. The consensus pattern, which is 
based upon a highly conserved region including three cysteines, is as follows: C-K-C- 
H-G-[LIVMT]-S-G-x-C. 

20 Protein Tyrosine Phosphatase (Y phosphatase) . SEQ ID NO: 1417 

represents a polynucleotide encoding a protein tyrosine kinase. Tyrosine specific 
protein phosphatases (EC 3.1.3.48) (PTPase) (Fischer et al., Science (1991) 255:401; 

Charboimeau ct al., Amu. Rev. Cell Biol. (1992) 3:463; Trowbridge, J. Biol Chem. 

(1991) 265:23517; Tonks et al., Trends Biochem. Sci. (1989) 74:497; and Hunter, Cell 

25 (1989) 55:1013) catalyze the removal of a phosphate group attached to a tyrosine 
residue. These enzymes are very important in the control of cell growth, proliferation, 
differentiation and transformation. Multiple forms of PTPase have been characterized 
and can be classified into two categories: soluble PTPases and transmembrane receptor 
proteins that contain PTPase domain(s). Structurally, all known receptor PTPases are 

30 made up of a variable length extracellular domain, followed by a transmembrane region 
and a C-terminal catalytic cytoplasmic domain. PTPase domains consist of about 300 
amino acids. The search of two conserved cysteines has been shown to be absolutely 
required for activity. Furthermore, a number of conserved residues in its immediate 
vicinity have also been shown to be important. The consensus pattern for PTPases is: 

35 [LIVMF]-H-C-x(2)-G-x(3)-[STC]-[STAGP]-x-[LIVMFY]; C is the active site residue. * 
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Zinc Finger. C2H2 Type (Zincfing C2H2) . SEQ ID NOs:308, 807, 
1324, 1503, 1527, 3081, 3193 and 3306 correspond to polynucleotides encoding novel 
members of the of the C2H2 type zinc finger protein family. Zinc finger domains (Klug 
et al., Trends Biochem. Sci. (1987) 72:464; Evans et al., Cell (1988) 52:1; Payre et aL, 
5 FEES Lett, (1988) 234:245; Miller et al., EMBOJ. (1985) 4:1609; and Berg, Proc. Natl. 
Acad. Sci. USA (1988) 55:99) are nucleic acid-binding protein structures. In addition to 
the conserved zinc ligand residues, it has been shown that a number of other positions 
are also important for the structural integrity of the C2H2 zinc fingers. (Rosenfeld et al., 
J. Biomol Struct. Dyn. (1993) 77:557) The best conserved position is found four 

10 residues after the second cysteine; it is generally an aromatic or aliphatic residue. The 
consensus pattern for C2H2 zinc fingers is: C-x(2,4)-C-x(3)-[LIVMFYWC]-x(8)-H- 
x(3,5)-H. The two C's and two H's are zinc Iigands. 

Src homology 2 . SEQ ID NOs:186 5 2591, 3307 and 3339 represent 
polynucleotides encoding novel members of the family of Src homology 2 (SH2) 

15 proteins. The Src homology 2 (SH2) domain is a protein domain of about 100 amino 
acid residues first identified as a conserved sequence region between the oncoproteins 
Src and Fps (Sadowski I. et al. ? Mol Cell Biol 6:4396-4408 (1986)). Similar sequences 
are found in many other intracellular signal-transducing proteins (Russel R.B. et al., 
FEBS Lett. 504:15-20 (1992)). SH2 domains function as regulatory modules of 

20 intracellular signalling cascades by interacting with high affinity to phosphotyrosine- 
containing target peptides in a sequence-specific and phosphorylation-dependent 
manner (Marangere L.E.M., Pawson T., J. Cell Sci. Suppl 75:97-104 (1994); Pawson 
T., Schlessinger J., Curr. Biol 5:434-442 (1993); Mayer B.J., Baltimore D., Trends 
Cell Biol 5:8-13 (1993); Pawson T, Nature 575:573-580 (1995)). 

25 The SH2 domain has a conserved 3D structure consisting of two alpha 

helices and six to seven beta-strands. The core of the domain is formed by a continuous 
beta-meander composed of two connected beta-sheets (Kuriyan J., Cowburn D. 5 Curr. 
Opin. Struct. Biol 5:828-837(1993)). The profile to detect SH2 domains is based on a 
structural alignment consisting of 8 gap-free blocks and 7 linker regions totaling 92 

30 match positions. 

Src homology 3. SEQ ID NO:234, 1832, and 1835 represent 
polynucleotides encoding novel members of the family of Src homology 3 (SH3) 
proteins. The Src homology 3 (SH3) domain is a small protein domain of about 60 
amino acid residues first identified as a conserved sequence in the non-catalytic part of 

35 several cytoplasmic protein tyrosine kinases {e.g., Src, Abl, Lck) (Mayer B.J. et al., 
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Nature 352:272-275 (1988)). Since then, it has been found in a great variety of other 
intracellular or membrane-associated proteins (Musacchio A. et al. 5 FEBS Lett. 307:55- 
61 (1992); Pawson T., Schlessinger J., Cilirr. Biol. 3:434-442 (1993); Mayer B.J., 
Baltimore D., Trends Cell Biol. 3:8-13 (1993); Pawson T. s Nature 373:573-580 (1995)). 
5 The SH3 domain has a characteristic fold which consists of five or six 

beta strands arranged as two tightly packed anti-parallel beta sheets. The linker regions 
may contain short helices (Kuriyan J. s Cowbum D., Curr. Opin Struct. Biol. 3:828-837 
(1993)). 

The function of the SH3 domain may be to mediate assembly of specific 
10 protein complexes via binding to proline-rich peptides (Morton C.J., Campbell I.D., 
Curr. Biol. 4:615-617(1994)). 

In general SH3 domains are found as single copies in a given protein, but 
there are a significant number of proteins with two SH3 domains and a few with 3 or 4 
copies. 

15 Fibronectin type III. SEQ ID NOs:746 and 1192 represent 

polynucleotides encoding novel members of the family of fibronectin type III proteins. 
A number of receptors for lymphokines, hematopoeitic growth factors and growth 
hormone-related molecules have been found to share a common binding domain. 
(Bazan J.F., Biochem. Biophys. Res. Commun. 754:788-795 (1989); Bazan J.F., Proc. 

20 Natl. Acad Sci. U.S.A. 57:6934-6938 (1990); Cosman D. et al., Trends Biochem. Sci. 
75:265-270 (1990); d ? Andrea A.D., Fasman G.D., Lodish H.F., Cell 55:1023-1024 
(1989); d f Andrea A.D., Fasman G.D., Lodish H.F., Curr Opin. Cell Biol 2:648-651 

The conserved region constitutes all or part of the extracellular ligand- 
25 binding region and is about 200 amino acid residues long. In the N-terminal of this 
domain there are two pairs of cysteines known, in the growth hormone receptor, to be 
involved in disulfide bonds. * 



Cytoplasmic | + 



35 Two patterns detect this family of receptors. The first one is derived 

from the first N-terminal disulfide loop, the second is a tryptophan-rich pattern located 
at the C-terminal extremity of the extracellular region. 
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A consensus for this protein family is: C-[LVFYR]-x(7,8)-[STIVDN]-C- 
x-W (The two C's are linked by a disulfide bond]. A second consensus for this protein 
family is: [STGL]-x-W-[SG]-x-W-S. 

LIM domain containing proteins. SEQ ID NOs:1269, 1309, 1360, and 
5 1386 represent polynucleotides encoding novel members of the family of LIM domain 
containing proteins. A number of proteins contain a conserved cysteine-rich domain of 
about 60 amino-acid residues. (Freyd G. et al., Nature 544:876-879 (1990); Baltz R. et 
al., Plant Cell 4:1465-1466 (1992); Sanchez-Garcia I., Rabbitts T.H., Trends Genet. 
70:315-320(1994)). 

10 In the LIM domain, there are seven conserved cysteine residues and a 

histidine. The arrangement followed by these conserved residues is C-x(2)-C- x(16,23)- 
H-x(2)-[CH]-x(2)-C-x(2)-C-x(16,21)-C-x(2,3)-[CHD]. The LIM domain binds two zinc 
ions (Michelsen J.W. et al, Proc. Natl. Acad. ScL U.S.A. 90:4404-4408 (1993)). LIM 
does not bind DNA, rather it seems to act as interface for protein-protein interaction. 

15 The consensus for this protein family is: C-x(2)-C-x(15,21)-(FYWH]-H-x(2)-[CH]- 
x(2)-C-x(2)-C-x(3)- [LIVMF]. The 5 C's and the H bind zinc. 

C2 domain (protein kinase C like). SEQ ID NOs:1325 and 2282 
represent polynucleotides encoding novel members of the family of C2 domain 
containing proteins. Some isozymes of protein kinase C (PKC) contain a domain, 

20 known as C2, of about 116 amino-acid residues, which is located between the two 
copies of the CI domain (that bind phorbol esters and diacylglycerol) and the protein 
kinase catalytic domain. (Azzi A. et al., Eur. J. Biochem. 205:547-557 (1992); Stabel S., 
Semin. Cancer Biol. 5:277-284 (1994)). 

The C2 domain is involved in calcium-dependent phospholipid binding 

25 (Davletov B.A., Suedhof T.C., J. Biol. Chem. 265:26386-26390 (1993)). Since 
domains related to the C2 domain are also found in proteins that do not bind calcium, 
other putative functions for the C2 domain include binding to inositol- 1,3,5- 
tetraphosphate. (Fukuda M., et al., J. Biol. Chem. 269:29206-2921 1 (1994).) 

The consensus pattern for the C2 domain is located in a conserved part 

30 of that domain, the connecting loop between beta strands 2 and 3. The profile for the C2 
domain covers the total domain. The consensus for this protein family is:: [ACG]-x(2)- 
L-x(2,3)-D-x( 1 ,2)-[NGSTLIF]-[GTMR]-x-[STAP]-D- [PA]-[FY] 

Serine proteases, trypsin family, active sites. SEQ ID NO:1410 
represents a polynucleotide encoding a novel member of the family of serine protease, 

35 trypsin proteins. The catalytic activity of the serine proteases from the trypsin family is 
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provided by a charge relay system involving an aspartic acid residue hydrogen-bonded 
to a histidine, which itself is hydrogen-bonded to a serine. The sequences in the vicinity 
of the active site serine and histidine residues are well conserved in this family of 
proteases (Brenner S., Nature 554:528-530 (1988)). 
5 A consensus for this protein family is: [LIVM]-[ST]-A-[STAG]-H-C [H 

is the active site residue]. A second consensus for this protein family is: [DNSTAGC]- 
[GSTAPIMVQH]-x(2)-G-[DE]-S-G-[GS]-[SAPHV]- [LIVMFYWH]- 
[LIVMFYSTANQH] [S is the active site residue]. 

RNA Recognition Motif Domain (RRM. RBD. or RNPl SEQ ID NOs: 
10 1464 and 1514 represent polynucleotides encoding novel members of the family of 
RNA recognition motif domain proteins (Bandziulis R.J. et al., Genes Dev. 5:431-437 
(1989); Dreyfiiss G. et al., Trends Biochem. Sci. 75:86-91 (1988)). 

Inside the putative RNA-binding domain there are two regions which are 
highly conserved. The first one is a hydrophobic segment of six residues (which is 
1 5 called the RNP-2 motif); the second one is an octapeptide motif (which is called RNP-1 
or RNP-CS). The position of both motifs in the domain is shown in the following 
schematic representation: 

xxxxxxx#####xxxxxxxxxxxxxxxxxxxxxxxxxxxxx######xxxxxxxxxx 
20 RNP-2 RNP-1 



As a consensus pattern for this type of domain the RNP-1 motif was 
used. The consensus for this protein family is: [RK]-G-{EDRKHPCG}-[AGSCI]- 
[FY]-[LIVA]-x-[FYLM] 
25 Phosphatidylinositol-specific phospholipase C Y Domain. SEQ ID NO: 

1707 represents a polynucleotide encoding a novel member of the phosphatidylinositol- 
specific phospholipase C, Y domain family of proteins. Phosphatidylinositol-specific 
phospholipase C (EC3.1.4.1 1), a eukaryotic intracellular enzyme, plays an important 
role in signal transduction processes (Meldrum E, et al., Biochim. Biophys. Acta 
30 7092:49-71 (1991)). It catalyzes the hydrolysis of 1-phosphatidyl-D-myo-inositol- 
3,4,5- triphosphate into the second messenger molecules diacylglycerol and inositol- 
1,4,5-triphosphate. This catalytic process is tightly regulated by reversible 
phosphorylation and binding of regulatory proteins (Rhee S.G., Choi K.D., Adv. Second 
Messenger Phosphoprotein Res. 26:35-61 (1992); Rhee S.G., Choi K.D., J. Biol. Chem. 
35 267:12393-12396 (1992); Sternweis P.C., Smrcka A.V., Trends Biochem. Sci. 77:502- 
506(1992)). 
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All eukaryotic PI-PLCs contain two regions of homology, referred to as 
"X-box M and "Y-box". The order of these two regions is the same (NH2-X-Y-COOH), 
but the spacing is variable. In most isoforms, the distance between these two regions is 
only 50-100 residues but in the gamma isoforms one PH domain, two SH2 domains, 
5 and one SH3 domain are inserted between the two PLC-specific domains. The two 
conserved regions have been shown to be important for the catalytic activity. At the C- 
terminal of the Y-box, there is a C2 domain possibly involved in Ca-dependent 
membrane attachment. 

Serine Carboxvpeptidases. SEQ ID NO: 1744 represents a 

10 polynucleotide encoding a novel member of the serine carboxypeptidases family of 
proteins. Carboxypeptidases may be either metal lo carboxypeptidases or serine 
carboxypeptidases (EC 3.4.16.5 and EC 3.4.16.6). The catalytic activity of the serine 
carboxypeptidases, like that of the trypsin family serine proteases, is provided by a 
charge relay system involving an aspartic acid residue hydrogen-bonded to a histidine, 

15 which is itself hydrogen-bonded to a serine (Liao D.I., Remington S.J., 1 Biol Chem. 
265:6528-6531 (1990)). 

The sequences surrounding the active site serine and histidine residues 
are highly conserved in all these serine carboxypeptidases. A consensus for this protein 
family is: [LIVM]-x-[GTA]-E-S-Y-[AG]-[GS] [S is the active site residue]. A second 

20 consensus for this protein family is: [LIVF]-x(2)-[LIVSTA]-x-[IVPST]-x-[GSDNQL]- 
[SAGV]-[SG]-H-x- [I VAQ]-P-x(3)-[PSA] [H is the active site residue]. 

dsrm Double-Stranded RNA Binding Motif. SEQ ID NO:1818 
represents a polynucleotide encoding a novel member of the dsrm double-stranded 
RNA binding motif proteins. In eukaryotic cells, a multitude of RNA-binding proteins 

25 play key roles in the posttranscriptional regulation of gene expression. Characterization 
of these proteins has led to the identification of several RNA-binding motifs. Several 
human and other vertebrate genetic disorders are caused by aberrant expression of 
RNA-binding proteins. (C. G. Burd & G. Dreyfuss, Science 265: 615-621 (1994)). 

Proteins containing double stranded RNA binding motifs bind to specific 

30 RNA targets. Double stranded RNA binding motifs are exemplified by interferon- 
induced protein kinase in humans, which is part of the cellular response to dsRNA. 

SEQ ID NOs:2577, 3183 and 3195 encode members of the 4 trans- 
membrane integral membrane protein family. This family consists of type III proteins, 
which are integral membrane proteins that contain a N-terminal membrane-anchoring 

35 domain that is not cleaved during biosynthesis, and which functions as a translocation 

UP 
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signal and a membrane anchor. The proteins also have three additional transmembrane 
regions. The consensus pattern is: G-x(3)-[LIVMF]-x(2)-[GSA]-[LIVMF] (2)-G-C-x- 
[GA]-[STA]-x(20-[eG]-x(20-[CwN]-[LIVM](2). 

SEQ ID NO:2944 encodes a polypeptide having a calpain large subunit, 
5 domain III. Calpains are a family of intracellular proteases that play a variety of 
biological roles. Calpain 3, also known as p94, is predominantly expressed in skeletal 
muscle and plays a role in limb-girdle muscular dystrophy type 2A. (Sorimachi, H. et 
al., Biochem. J. 328:721-732, 1997). 

SEQ ID NOs:191 1 and 1980 encode polypeptides having a C3HC4 type 

10 zinc finger domain (RING finger), which is a cysteine-rich domain of 40 to 60 residues 
that binds two atoms of zinc, and is believed to be involved in mediating protein-protein 
interactions. Mammalian proteins of this family include V(D)J recombination 
activating protein, which activates the rearrangement of immunoglobulin and T-cell 
receptor genes; breast cancer type 1 susceptibility protein (BRCA1); bmi-1 proto- 

15 oncogene; cbl proto-oncogene; and mel-18 protein, which is expressed in a variety of 
tumor cells and is a transcriptional repressor that recognizes and binds a specific DNA 
sequence. The consensus pattern is: C-x-H-x-[LIVMFY]-C-x(2)-C-[LIVMYA]. 

SEQ ID NO:3274 encodes a eukaryotic transcription factor with a fork 
head domain, of about 100 amino acid residues. Proteins of this group are transcription ' 

20 factors, including mammalian transcription factors HNF-3-alpha, -beta, and -gamma; 
interleukin-enhancer binding factor; and HTLF, which binds to a region of human T- 
cell leukemia virus long terminal repeat. The consensus pattern is [KR]-P-[PTQ]- 

[FYLVQH]-S-[rY]x(2)-[LIVM]-X(3,4)-[AC]-[LIM], 

SEQ ID NO:3345 encodes a polypeptide having a PDZ domain. Several 

25 dozen signaling proteins belong to this group of proteins that have 80-100 residue 
repeats known as PDZ domains. Several of the proteins interact with the C-terminal 
tetrapeptide motifs X-Ser/Thr/X-Val-COO- of ion channels and/or receptors. (Ponting, 
C. P., Protein Sci. 6;464-468, 1997.) 

SEQ ID NO:3351 encodes a polypeptide in the family of phorbol 

30 esters/glycerol binding proteins. Phorbol esters (PE) are analogues of diacylglycerol 
(DAG) and potent tumor promoters. DAG activates a family of serine-threonine protein 
kinases, known as protein kinase C. The N-terminal region of protein kinase C binds 
PE and DAG, and contains one or two copies of a cysteine-rich domain of about 50 
amino acid residues. Other proteins having this domain include diacylglycerol kinase; 

35 the vav oncogene; and N-chimaerin, a brain-specific protein. The DAG/PE binding 
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domain binds two zinc ions through the six cysteines and two histidines that are 
conserved in the domain. The consensus pattern is: H-x-[LIVMFYW]-x(8, 1 l)-C-x(2)- 
C-x-(3)-[LIVMFC]-x(5, 10)-C-x(2)-C-x(4)-{HD]-x(2)-C-x(5, 9)-C. 

SEQ ID NO:2216 encodes a polypeptide having a WW/rsp5/WWP 
5 domain. The protein is named for the presence of conserved aromatic positions, 
generally tryptophan, as well as a conserved proline. Proteins having the domain 
include dystrophin, vertebrate YAP protein, and IQGAP, a human GTPase activating 
protein which acts on ras. The consensus pattern is: W-x(9,l l)-[VFY]-[FYW]-x(6,7)- 
[GSTNE]-[GSTQCR]-[FYW]-x(2)-P. 

10 SEQ ID NO:2428 encodes a member of the dual specificity phosphatase 

family, having a catalytic domain, and SEQ IDS NOs:2281 and 2310 encode members 
of the protein tyrosine phosphatase family. These families are related and classified as 
tyrosine specific protein phosphatases. The enzymes catalyze the removal of a 
phosphate group from a tyrosine residue, and are important in the control of cell growth, 

15 proliferation, differentiation, and transformation. The consensus pattern is [LIVMF]-H- 
C-x(2)-G-x-(3)-[STC]-[STAGP]-x-[LIVMFY]. 
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c 
r 


iVIUUUUj4D ;L.AUj 


CH02COH 


1 146 


7 1 695 

— I Do J 


RTA00007700F e 71 1 pc,» n 


c 
r 


rvlUUOUojo 1 U:rUo 


CH02COH 


147 


1S00S5 


RTA0000" ) 6JPF i 10 1 P >en 


c 
r 


rviuuujvv3 .• a. ruy 


LH09LNL 


\ 148 

1 1 ~u 


70700 


RTA0000^7 1 OF i IS 1 P On 


c 
r 


iVIUUUJij / jA.dUj 


CHOjiVlAH 


1 149 


170QR 1 

J /770 1 


RTA00007 n S7F i IS 1 P On 


r 
r 


iW UUU j 9 9 o o A . h Uo 


f [Art r VII 

CH09LNL 


1 150 


1765Q 1 


RTA00007A75F r 0 1 1 P On 


r 
r 


[Viuuujyi i j a . uu i 


f* \ f A A f VI f 

LH09LNL 


151 


Q^O^S 

7_ L/J O 


RTA0000 -) 66 >F m 04 1 P On 
rv 1 i\ vju_7 r.iii.u**. 1 .r. jcu 


F 

r 


MUUlLiJoy^ A. Muo 


LHOjiVIAH 


1 15*> 


1 Q6Q16 

1 707J0 


RTA0000*>66*iF m 0" 1 1 P On 
rv 1 nwuu-0ujr.1n.v-. 1 .r.octj 


c 
r 


tVIUU(JJ2oojv_ .HU") 


r_r a "* \ f \ I r 

LHOjivlAH 


I 153 




RT A0000~>66SF h 04 1 P 


F 

r 


Eviuuuj-iyycjfci. a i i 


LH0SLNH 




17944 J? 


RTA0000"">6S0F n ~> 1 ** P 


c 

r 


MuUUj9oj- A: o 1 _ 


/~- | 1 A A I V i r 

CH09LNL 


1 155 
• 


4 1 A0A 
*+ 1 DUO 


RTA 000077 1 IF r» 10 1 P <sr»n 
rv 1 nuuuu- / ijr.c. iu. 1 .r.jcu 


c 
r 


ivlUUUJ /jU 1 A.vjOj 


/~ 1 1 A 1 \ f * I 

CH04MAL 


156 

1 1 JU 


7 1 191 7 


RT A00007/;/;4F i 10 1 P v"> n 
fv I nWUUiOOHr 4. 1 7. 1 .r. jcq 


c 
r 


[VIU00276j4A:D 1 1 


CH04MAL 


1 1 57 


1714^4 


OT A00007/C7 1 p 1 11 1 p C An 


r 


iV1000jSj2 / A:C 1 1 


CH09LNL 


t 1 S3 

1 1 JO 


170,497 


0TA00007A70F L- 1"? 1 P <Z*n 


c 
r 


\ .4 A A A "* r\"7A A I~> . r>A1 

MQO0_>970OB:D0_ 


CH09LNL 


1 l J7 


17^7Q£ 


RTA00007690F f 17 1 P On 
fv I . b \UUU*J_D0Ur .1.1 < . 1 .r . icq 


r 

r 


iVl(JUQj979?3:H 10 


/"* i iaai vrr 

CH09LNL 


1 1 A A 


17v7Q£ 


dta n000^A90C f 1 7 D C*n 
rv 1 AU"JUU-0o"Jr . r. l /.-.r.jcq 


c 
r 


iV1000j979^B:H 10 


CH09LNL 


1 O 1 


1 -U'toJ 


P TA OOOO^AAIP K 1^ 1 P Q^n 


c 
r 


M0002 166?B:r 12 


CHOjMAH 


1 6*> 

1 U — 




0 TA 00007£71F f 1 7 1 P C^n 


c 
r 


M000j90/_C:h02 


CH09LNL 


1 6 > 


"5 905 1 1 
JoU J I J 


RTa00007^77F « K i p 

rv i HUUUU— o / / r . p. i _ — r. occj 


c 
r 


\ ,1 AAA *» A 1 O/ — . [~ A 1 


/"* ! lAAt V/l 

CH09LNL 


1 64 
1 u-* 




DTA 00007/; 9": c ; o~ ^ D C^n 


c 
r 


M000400/ . D:u 1 1 


CH09LNL 


16S 


179 1 79 

J to l io 


DTA 00007<90F 1 11 1 P On 


c 
r 


iVi00Qj98_UA:F 1 1 


CH09LNL 


166 

1 uu 


477Q4 7 


QT A00007AA- ; iF n ">4 1 D 

rv I -*\uuuu_do ;r .n i .r.occ] 


c 
r 


M000j249?B:D02 


1 IAOI V'TT 

CH08LNH 


1 67 
I u / 


4777£Q 
/ _0 7 


rtaoooo - */;/;-;? h oi i P Q^n 

rv 1 .-\UUuU-0OJr .u.U J . j .r .ocLJ 


c 
r 


MUU0^o2 1 -C :B0o 


ITAOT VMI 

CH08LNH 


1 6Jt 

1 UO 


704 > 1 


DTA000077I0C i 10 1 0 On 


r 
r 


\,(AAA"n" , n I O.fAl 

M00022j9 ! B:hO 1 


CHOjMAH 


1 69 


177n.fi1 
J / /UUJ 


RTA0000^^91F t? 00 P <^n 


c 
r 


M0U04U0o_ B: dO? 


CH09LNL 


1 70 
i / u 


J7 77<Q 


R taoooo">>;/;<f i I i P <*>n 


r 


\J AAA '1 lAi^r, i A 1 

iV1000j2499C : AO 1 


/"* ITAOf \ rl 1 

CH08LNH 


1 7 1 


477^J.Q 
/ J*+7 


RTA00007A69F L- 1 1 1 P On 
rv i muvjuvj— duo r .k.. i _ . i . r . Oct] 


r 


MUUUj j(Jj4C_ . AUo 


/~ [ j A O I VIII 

C H0SLNH 


1 77 


17199 1 


RTA00007A7") F h if) "» P On 
rv 1 rtUUUU—0 / — r.u — u — r.jcq 


r 

r 


\ lAAA^Oz: O P\. ; r A ^ 


/"•tiaat v:i 

CH09LNL 


1 71 


1 997 1 < 

I OO- t J 


RTA00007/;^JF F 1 1 P On 


c 
r 


M00U2 7200 A : r 02 


/"" TfA i* f"» 1 

CH04MAL 


1 74 


17QA91 


RTAO0OO7A9 I F fi 04 ~* P On 
t\ 1 M.UUUU_Oo I r.u.u*+.„. r.ocC| 


c 
r 


M000j98j / B:0 10 


r 1 AA 1 V 1 f 

CH09LNL 


1 75 


190^57 


RTA00OO">£79F rl 1 "? "* P 

rv 1 nUUUVJ-0 / flr.U, l _._,r .ocq 


c 
r 


Mnnn - ' n.i ;;n,ijA i 
M0uuj u 4 30 U. HU4 


TfAAl VIT 

CH09LNL 


176 

i ' u 


179114 


RTA00007A7QF h 10 1 P ^^n 
rv i rtuuuu-u / 7 r . rit i u. i . r. jcu 


c 
r 


N/t AAA " A^ O ^/^* . Lf 1 1 


f II A A I V ; I 

CH09LNL 


1 77 


177010 


RTA00007690F u 1 4 t P On 


c 
r 


\,f AAA^OTAO Q - DAT 

MUUUJ7 /9od.B0- 


/ — I r A A 1 VIT 

CHOvLNL 


178 


17960"' 


RTA00007/;90F n "*0 » P 

rv i nuuuu- uour . u._.u . _ . r . ocu, 


r 


IV 1 AAA AQ C \ ■ CAT 

MUUUjvo jj A. rt) / 


^LI/^AI VIT 

C HU4LNL 


179 


1777Q 


RTA0000^70QF ti I PS^n 
rv i r\uuuu— / u7r.u— . i ,r. jcu. 


c 
r 


\>1AAAA< AT" Q.Q n 

ivlUUUU^O /;d.d 1- 




180 


176170 

J / UJ / 7 


RTA000O"*6S0F r 15 1 p^ n 
rv i ,*auuuu_uou r i .r. ocy 


c 
r 


\A AAA "* Q~JQ V ■ LJ 1 A 

ivlUUujy /o- A. rt IU 


All A A| v ; I 


181 


375963 


RTA00002675F.i.l2. 1 P.Seq 


F 

t 


MOOO'59' , '?S VRP 

I'lV/vUJ ' - JO." . U I — 


V— I IV 7Ln L 


182 


378683 


RTA00002680F.a.l4.2.P.Scq 


F 


M00039773D:A09 


CH09LNL 


183 


374946 


RTA00002673F.j.24.:.p.Seq 


F 


M00039096A:E07 


CH09LNL 


184 


429583 


RTA00002666F.g. 10.1. P.Seq 


F 


M00032534A:H08 


CHOSLNH 


185 


28338 


RTA000027llF.e.l".i P.Seq 


F 


iV100022930C:E02 


CH03MAH 


186 


427970 


RTA00002665F,j. 13.1. P.Seq 


F 


M0003I36SA:E10 


CHOSLNH 


187 


379650 


RTA00002633F.h.22.2.P.Seq 


F 


M00040072C:G09 


CH09LNL 


188 


37966! 


RTA00002676F.C.05.:. P.Seq 


F 


M00039277D:G10 


CHOOLNL 



WO 01/02568 



PCT/US00/18374 



190 



192 



193 



194 



195 



196 



197 



200 



201 



202 



203 



204 



205 



206 



207 



208 



209 



210 



CLUSTE R 
376132 



374797 



375339 



397115 



1 86655 



404632 



19609 



404682 



330412 



39441 



180036 



430738 



40667 



397421 



398775 



87345 



379920 



180666 



404340 



175509 



46423 



401713 



377003 



378891 



412778 



214 



215 



216 



213 



219 



220 



221 



in 



224 



225 



226 



227 



373786 



173692 



88388 



35318' 



37713! 



21438 



447487 



364 



404024 



152305 



1 06050 



41 126 



1 1 3496 



447487 



SEQ NAME 
RTA000026 77F.b.l7.2. P. Seq 



RTA0000267SF.b.l2.2.P.Seq 



RTA00002674F.a.l3.1.P,Seq 



RTA00002633 F.i.22.2.P.Seq 



RTA00002712F.i.2U.P.Seq 



RTA00002687F.b.l3.1.P.Seq 



RTA00002709F.C.05.2. P.Seq 



RTA00002687F.b. 13.2. P.Seq 



RTA00002680F.k.l5.1.P.Seq 



RTA00002639F.d. 1 7,3. P.Seq 



RTA00002679F.m. 1 6. I .P.Seq 



RTA00002669F.i.l5.2.P.Seq 



RTA00002712F.g.22. 1.P.Seq 



RTA0000268 1 F.c. 16.2.P.Seq 



RTA00002679F.f.l l.l.P.Seq 



RTA000027l2F.f.l9.I.P.Seq 



RTA00002679F.b.24.2.P.Seq 



RTA00002634F.C.04. 1 .P.Seq 



RTA00002687F.b.05.2. P.Seq 



ORIENTATION! 
T 



CLONE ID 
M00039340B:E07 



M00039444C:H02 



M00039I20C:C09 



M00O40O76C.DO6 



M0002694ID:A04 



M00039766D:H0I 



M00005457C:A03 



M00039766D:H0I 



M000398I6B:D04 



M00042742D:D05 



M0O0397l0C:GO3 



M0003323ID:B09 



M00026882D:G09 



M00039854B:F09 



M00039675D-.H05 



M00026S50D.F09 



M00039660C:CIO 



M000401 1 5B:H 12 



M00039764C;D07 



RTA0Q002680F.e.08.:.P.Seq 



RTA000027l2F.i.02.1.P.Seq 



RTA00002685F.p. 10.2. P.Seq 



RTA00002683F.g.09.I.P.Seq 



RTA00002672F.i. 13.1. P.Seq 



RTA00002685F.i.07.2.P.Seq 



RTA00002679F.a.20.2.P.Seq 



RTA00002680F.o.20.2.P.Seq 



M00039790B:D03 



lM000269|4A:H10 



LIBRARY 
CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CH04MAL 



CH14EDT 



CH02COH 



CHI4EDT 



CH09LNL 



CH15CON 



CH09LNL 



CH08LNH 



CH04MAL 



CH09LNL 



CH09LNL 



CH04MAL 



CH09LNL 



CH09LNL 



CHI4EDT 



CH09LNL 



CH04MAL 



M00039647A:HI I 



M00040062B:B05 



M000390I6A:A02 



M00O39533D:FO4 



M00039655C:C07 



M0O039835A.-FO* 



RTA000027l3F.f.22.1P.Seq 



RTA00002676F.b.04.2.p.Seq 



RTA00002632F.e. 10.1. P.Seq 



RTA00002708F.f. 17. 1 .P.Seq 



RTA00002639F.e.04.3. P.Seq 



RTA00002710F.2L06.I. P.Seq 



RTA00002687F.e.l3.2.P.Seq 



RTA000027l2F.d.02.1. P.Seq 



RTA000027I3F.O.I7.I. P.Seq 



RTA000027 1 3 F.I. 1 2. 1 .P.Seq 



CH12EDT 



CH09LNL 



CH09LNL 



CH12EDT 



CH09LNL 



CH09LNL 



M00027355A:B07 



M00039933C:E1 1 



M00004152A:CI2 



M00042S95A:D10 



M00007929C:B08 



M00039958A.A03 



M00023376B:G04 



M0O027668C.H12 



M00027546C:BIO 



RTA000027 13F.n.20. 1 P.Seq 



228 



229 



230 



231 



i -» 

2jj 



234 



235 



146335 



76647 



376746 



373523 



455466 



374031 



373997 



455717 



RTA00002689F.e.04. 1 .P.Seq 



MOO027625A:H01 



M00042 S95A:D10 



RTA000027 1 2F.j. 1 7. 1 .P.Seq 



RTA00002674F.h.07,2. P.Seq 



RTA00002674F.f. 12.2. P.Seq 



RTA00002674F.n.2 1 .2. P.Seq 



RTA00002694F.C. 10.1. P.Seq 



RTA000026S3F.p,. 1 7. 2. P.Seq 



•RTA00002673F.m.04.:.P.Seq 



RTA00002694F.a.06. ! P.Seq 



M000269SOA:D09 



M00'03914OD:DO9 



M00039I33B:F08 



M00039177B:D0: 



M00043461D:E06 



M00040105C:FM 



M0O039105C:B0S 



M00042593C:G06 



CH04MAL 



LH09LNL 



CH09LNL 



CH0ICOH 



CH15CON 



CH03MAH 



CHI4EDT 



■CH04MAL 



CH04MAL 



CH04MAL 



CH04MAL 



CHI5CON 



CH04MAL 



CH09LNL 



CH09L^'L 



CH09LNL 



CH20COHLV 



CH09LNL 



CH09LNL 



CH20COHLV 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 



236 



237 



238 



239 



240 



241 



242 



24! 



244 



CLUSTER 



SEQ NAME 



ORIENTATION 



373837 



374513 



375628 



377732 



378326 



378001 



378459 



373862 



373252 



RTA00002672F.p.22.2.P.Seq 



RTA00002672F.i.l6.2.P.Seq 



RTA00002672F.k.04.2.P.Seq 



RTA0000268 lF.p.09. l.P.Seq 



RTA0000268IF.m.l L l.P.Seq 



RTA0000268 1 F.m.22. l.P.Seq 



RTA0000268!F.L07.2.P.Seq 



RTA00002671F.g.01.2.P.Seq 



RTA00002670F.k. 16. l.P.Seq 



CLONE ID 



LIBRARY 



M00039050A:HIO CH09LNL 



M00039015B:H09 CH09LNL 



M00039026D:F05 



CH09LNL 



M00039910CGI0 



CH09LNL 



M00039896C:H0 



CH09LNL 



M00039898D:C06 



CH09LNL 



M00039879D:BI1 



CH09LNL 



M00038284B:H04 



CH09LNL 



M00033451A:H01 



CH09LNL 



245 



378475 



RTA00002672F.g.24. 1 .P.Seq 



M00039006D:B01 



CH09LNL 



246 



379941 



RTA000026S2F.j.I5.!.P.Seq 



M00039990C:DIO 



CH09LNL 



247 



42770; 



RTA 00002665 F.e. 1 1 . 1 .P.Seq 



M00028357A:GIO CH08LNH 



248 



373976 



RTA0000267lF.p.l5.2.P.Seq 



M000386I9B:A03 



CH09LNL 



249 



431643 



RTA00002669F.I.I6.3 : P.Seq 



M00033276D:H09 CH08LNH 



250 



251 



252 



383502 



RTA00002670F.k.07. l.P.Seq 



378764 



RTA00002681F.J.04. l.P.Seq 



431629 



RTA00002669F.L 14.3. P.Seq 



M00033446D:B02 



CH09LNL 



M00039884A:H11 



CH09LNL 



M00033276B:G08 



CH08LNH 



25: 



372992 



RTA0000267 1 F.b.l 6.2.P.Seq 



M00033594CB03 



CH09LNL 



254 



255 
256 



431601 



RTA00002669F.k.08.3. P.Seq 



21059 
430689 



RTA000027 1 OF.c.05. l.P.Seq 



RTA00002669F.L24.3. P.Seq 



M00033263B:G04 



CH08LNH 



M0000S053A:FI0 CH03MAH 



M00033243B:A05 



CH08LNH 



257 



131764 



RTA00002662F.C. 14. l.P.Seq 



tMOO0O6893C:EO7 



CH02COH 



258 
259 



260 



373300 
384601 



375389 



RTA00002674F.C.2 1 .2.P.Seq 



RTA00002670F.k.06. 1 .P.Seq 



RTA00002674F.a.l3.2.P.Seq 



M00039126D:A08 



CH09LNL 



M00033446C:G08 



CHO^LNL 



M00039120C:C09 



CHO^LNL 



26! 



262 



26! 



15248 



RTA000027IOF.f.23. l.P.Seq 



428134 



RTA00002666F.C. 15. l .P.Seq | 



374184 



RTA00002672F.a. 1 9. 1 .P.Seq 



M000:Z127C:H0: 



CH03MAH 



M00032540A:A09 CHOSLNH 



M00038633A:D07 



CH09LNL 



264 



265 



136225 



RTA00002 6 76 F . n.02 . 2 . P, Seq 



401713 



RTA00002685F. p. 10. l.P.Seq 



M00039319C:A04 



CH09LNL 



M00039647A:H1 1 



CH12EDT 



266 



267 



268 
269 



270 



27104 



RTA0000266 1 Fa.09. l.P.Seq 



207466 



RTA00002664F.j.08.2.P.Seq 



143045 



RTA00002663F.a.02. 1 .P.Seq 



378830 



RTA00002675F.e.07. l.P.Seq 



21731 



RTA00002709F.k.07. l.P.Seq 



M00001 363 D:D09 



CHOICOH 



M00027733A:A02 CH04MAL 



M00007941D:C09 CH03MAH 



M00039221A:H03 



CHO°LNL 



M000070I3A.D09 CH02COH 



271 



428552 



RTA00002666F.C. 16. 1 .P.Seq 



M0003254ID:H08 



CHOSLNH 



272 



187632 



RTA00002664F.i. 1 5.2. P.Seq 



M0002~6I7B:CI2 



CH04MAL 



27; 



431053 



RTA00002668F.o.05.2.P.Seq 



M00033130B.F06 



CHOSLNH 



274 



275 



188972 



RTA00002664F.d.20. l.P.Seq 



430678 



RTA00002668F.h.l2. 1 .P.Seq 



M00027030CH06 



CH04MAL 



M00032994A:A08 CHOSLNH 



276 



277 
278 



279 



374042 



RTA00002672F.a.08. 1 .P.Seq 



24332 



RTA00002709FJ.07. l.P.Seq 



376764 



RTA 000026 74 F f.20. 1 .P.Seq 



13433S 



RTA00002662F.c.l5.2.P.Seq 



M0003S63ICBI0 



M00006955C:F06 



CHO°LNL 
CH02COH 



M00039135D:F05 



CHO°LNL 



M00006897A:H02 CH02COH 



280 



28 



282 



37554 1 



RTA00002680F.d.2l.2.P.Seq 



22890^ 



RTA00002664F.e.08.2.P.Seq 



58063 



RTA0000266 1 F.h. IS. l .P.Seq 



14 



M000397S8A:E03 



CHO°LNL 



M0002"085C:EI 



CH04MAL 



M00004234A:E07 



CHOICOH 



WO 01/02568 



PCT/US00/18374 



2s: 



284 



285 



236 
287 



CLUSTER 



SEQ NAME 



330500 



34928 



374028 



374121 



429547 



RTA0000267QF.p. 19.2-P.Seq 



RTA 000027 10F.p.2 1. 1. P.Seq 



RTA00002674F.k.03.2.P.Seq 



RTA00002672F.h.22.2.P.Seq 



RTA00002668F.C.07. 1 .P.Seq 



ORIENTATION 
F 



CLONE !D 
M00033533B:E06 



M00022 795 B :G06 CH03 MA H 



M00039|56A:B1 1 CH09LNL 



M000390 1 3 A:C09 CH09LNL 



M000329J 7D:G09 CH08LNH 



LIBRARY 
CH09LNL 



288 



80668 



RTA00002670F.p-l l.2.P.Seq 



M0003358ICHI0 CH09LNL 



289 



290 



258704 



RTA00002665F.m.Q6. 1 .P.Seq 



380325 



M00032480B:EI0 CH08LNH 



RTA00002670F.p.22.2. P.Seq 



M00033583D:B05 CH09LNL 



29! 



378326 



RTA0000268IF.m.ll.2.P.Seq 



M00039896C:H01 



CH09LNL 



29; 



375618 



RTA00002675F.d.!3.l.P.Seq 



M000392!8A:F03 



CH09LNL 



20999 



RTA00002709FJ. 16.1. P.Seq 



M00006977C:G04 



CH02COH 



294 



295 



296 



297 



298 



299 



300 



29102 



RTA000027IOF.p.l8.LP.Seq 



M00022793D:BOI CH03MAH 



379334 



RTA00002680F.b.22.I.P.Seq 



M00039778C:A04 



CH09LNL 



2394! 



RTA00002709F.i. 12.1 .P.Seq 



M00006886D:H02 



CH02COH 



373998 



RTA00002672F.a. 10.2.P.Seq 



M0003863ID:B02 



CH09LNL 



373325 



RTA00002672F.c.l4.2.P.Seq 



M00038662B:AI2 CH09LNL 



373818 



RTA00002672F.e.l5.2.P.Seq 



M00038995C:G08 



429843 



RT A00002668F.C. 10.1. P.Seq 



M00032*MSB:E06 



CH09LNL 



CH08LNH 



302 



304 



i05 



306 



307 



427755 



RTA00002665F.d. 19.3. P.Seq 



M0002S3 1 6B:H 12 CH08LNH 



189177 



RTA 000026641^23. 2. P.Seq 



1 3294 



RTA00002709FJ. 15.1. P.Seq 



178801 



RTA00002663F.n.O I.I. P.Seq 



230865 



RT A 00002664F.d.03.2 . P. Seq 



178801 



RTA00002663F.m.24. 1 .P.Seq 



378809 



RTA00002672F.S.2 1 .2.P.Seq 



M00026922C:G03 CH04MAL 



M00006968A:G08 CH02COH 



M00022997A:F06 CH03MAH 



M0002692SD:A03 CH04MAL 



M0002 2 7 a : F 06 CH03MAH 



M00039005CH0I 



CH09LNL 



!08 



309 



44- 



378957 



RTA00002670F.d. 1 7.2.P.Seq 



M00033362CC05 



CH09LNL 



373523 



RTA00002674F.n.2 1.1. P.Seq 



M00039t77B:D03 



CH09LNL 



375458 



RTA00002678F.L06.2.P.Seq 



M000396! i D:DI 1 



CH09LNL 



314 



315 



16 



72797 



RTA0000266 lF.e.07.1. P.Seq 



429992 



RTA00002663F.C.2 1 . 1 .P.Seq 



374410 



RTA00002674F.k.l l.2.P.Seq 



376553 



RTA00002674F.g. 19.1. P.Seq 



429096 



RTA00002666F.f. 1 6. 1 P.Seq 



M00003761CF02 



CHOICOH 



M0003292lB:H08 



CH08LNH 



M00039158B:GI2 



CH09LNL 



M00039I39A:C09 



CH09LNL 



M000J2578A:G06 



CH03LNH 



18 



319 



120 



3JM 
322 



323 
324 



325 



326 



327 



328 



j29 



181943 



RTA00002663F.n.05.LP.Seq 



M00023003CDO' 



CH03MAH 



378475 



RTA00002672F.h.0l.2.P.Seq 



M00059006D:BOI 



CH09LNL 



427336 



RTA00002665F.C.23.I. P.Seq 



M000:s:i0B:D02 



CH03LNH 



374042 



RTA00002672F.a.08.2.P.Seq 



M00038b3IC:BIO 



CH09LNL 



336543 
376659 



RTA00002672FT. 1 3.2. P.Seq 



M00033OP9B:Gl I 



CH09LNL 



RTA00002678F.h. 1 1 .2.P.Seq 



MO0O3»?475C:EIO 



29135 



RTA00002663F.c.09.LP.Seq 



377967 



RTA0000268!F.m.l7.2.P.Seq 



431330 



RTA00002668F.m. 1 6.2. P.Seq 



373824 



RTA00002680F.i.l9.2.P.Seq 



50094 



RTA0000266 1 Fj.02.2.P.Seq 



214272. 



RTA00002664FA03 .2. P.Seq 



377293 



RTA00002630F.b. I T.2.p,Seq 



M0002l^23C:Dt 1 



M00039S^7D:CI0 



M000330"4A:C08 



M00039S08D:H02 



M000043-SA.B10 



M00027366A:FI 1 



M000;o™C:E05 



CH09LNL 



CH03MAH 



CH09LNL 



CH08LNH 



CH09LNL 



CHOICOH 



CH04MAL 



CH09LNL 



IS 
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SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


330 


195053 


RTA00002663 F.n. 1 6. 1 .P.Seq 


F 


M00023044B:D02 


CH03tVlAH 


331 


21274 


RTA00002709F.m.09. 1 .P.Seq 


F 


M00007194A:B09 


CH02COH 


332 


376580 


RTA00002675F.b.20.1. P.Seq 


F 


M000392I2CCI2 


CH09LNL 


^ ^ ^ 


374725 


RTA00002673F.f.02.2. P.Seq 


F 


M00039070D:C02 


CH09LNL 


334 


25238 


RTA000027 1 0F.n.08. 1 .P.Seq 


F 


M000226J4D:COS 


CH03MAH 


~» c 


377337 


RTA000026S3F.L07.2. P.Seq 


F 


M00040085D.A10 


CH09LNL 


336 


450485 


RTA00002692F.a. 13.2. P.Seq 


F 


M00042625C:B04 


CH18CON 


337 


21989 


RTA00002709F.h.22. 1 .P.Seq 


F 


' M0000686IB:F09 


CH02COH 


■* -t o 

J Jo 


58296 


RTA0000266 1 F.i.20.2. P.Seq 


F 


M00004354D:E05 


CHOICOH 


339 


379144 


RTA00002679F.1. 14.1. P.Seq 


F 


M00039705D:F02 


CH09LNL 


340 


379690 


RTA00OO2680F.b.2 1 .2. P.Seq 


F 


M00039778B:G03 


CH09LNL 


341 


379640 


RTA00002681F.d. 12.2. P.Seq 


F 


M00039859C:GI0 


CH09LNL 


342 


373988 


RTA00002673F.h.23.1. P.Seq 


F 


M00039079A:A05 


CH09LNL 


343 


373988 


RTA00002673F.h.23.2.P.Seq 


F 


M00039079A:A05 


CH09LNL 


■344 


380673 


RTA00002673FJ. 1 3.2.P.Seq 


F 


M00039084C:H03 


CH09LNL 


345 


55243 


RTA0000266 1 F.i.06.2. P.Seq 


F 


M00004282D:C1 1 


CHOICOH 


346 


40557 


RTA00002713F.h.2 1.1. P.Seq 


F 


M00027398CFO7 


CH04MAL 


347 


375467 


RTA00002677F.m.03. 1 P.Seq 


F 


M000;^4I7A:D03 


CH09LNL 


348 


398406 


RTA00002679F J.02. 1 .P.Seq 


F 


M00039689C:E08 


CH09LNL 


349 


430392 


RTA00002668F.k. 19.1. P.Seq 


F 


M0OO33O37D:Cl 1 


CH08LNH 


350 


376746 


RTA00002674F.f. 12.1. P.Seq 


F 


-M00039133B:FO8 


CH09LNL 


351 


1 15595 


RTA000027 1 3F.e.07. 1 .P.Seq 


F 


M000:729TA:C04 


CH04MAL 


352 


377182 


RTA00002682F.U 1.1. P.Seq 


F 


M000400IOA:F10 


CH09LNL 


353 


380659 


RTA00002634F.e.07.2. P.Seq 


F 


M000^0124D:H01 


CH09LNL 


354 


373862 


RTA0000267 1 F.g.O 1 . 1 .P.Seq 


F 


M00038284B:H04 


CH09LNL 


355 


376096 


RTA00002677F.b. 16.2. P.Seq 


F 


M00039}40A;D05 


CH09LNL 


356 


372887 


RTA00002670F.d.05.2. P.Seq 


F 


M0003 33 58 A.H 12 


CH09LNL 


357 


378475 


RTA00002672F.g.24.2. P.Seq 


F 


M00O390O6D:B0l 


CH09LNL 


358 


427336 


RTA00002665F.C.23.3. P.Seq 


F 


M000282IOB:D02 


CH03LNH 


359 


373814 


RTA00002672F.b.02.2.P.Seq 


F 


M0003S63?A:G09 


CH09LNL 


360 


215506 


RTA00002664F.h.03.2.P.Seq 


F 


M00O:7438C:GO7 


CH04MAL 


361 


374465 


RTA00002673F.C.07.2. P.Seq 


F 


M00O39058C:HO2 


CH09LNL 


362 


428784 


RTA00002667F.C 18.1 .P.Seq 


F 


M00032744B:F10 


CH08LNH 


363 


379581 


RTA00002676F.3.2 1 .2. P.Seq 


F 


M00039273B:F02 


CH09LNL 


364 


378371 


RTA00002678F.f,20,2.P,Seq 


F 


MO0O39465A:AO8 


CH09LNL 


365 


375154 


RTA00002676F.c.l3.2.P.Seq 


F 


M00039279B:H02 


CH09LNL 


366 


431214 


RTA00002669F.k.04. 1 .P.Seq 


F 


MO0O33262D:A! 1 


CH08LNH 


367 


376053 


RTA00002675 F.1.03. 1 .P.Seq 


F 


M00039249A:C12 


CH09LNL 


368 


373282 


RTA00002680F.j. 19.2. P.Seq 


F 


M00039SI3B:DI 1 


CH09LNL 


369 


33397 


RTA0000266 1 F.h.04. 1 .P.Seq 


F 


M00004168A:G1 1 


CHOICOH 


370 


376706 


RTA00002675F.C.02. 1 .P.Seq 


F 


MO0O39213B:F05 


CH09LNL 


371 


378292 


RTA0000268IF.i.09.2.P.Seq 


F 


M00039880A.H11 


CH09LNL 


372 


431612 


RTA00002669F.e.23.3. P.Seq 


F 


M00O33202D:G06 


CH08LNH 


373 


378471 


RTA00002679F.0. 1 7. 1 P.Seq 


F 


M00O3972"C:B09 


CH09LNL 


374 


378666 


RTAO0O0268 1 F.i.05.2. P.Seq 


F 


M00039ST9C:F05 


CH09LNL 


375 


374894 


RTA00002675F.f.04. 1 P.Seq 


F 


M00039Z24A:£12 


CH09LNL 


376 


430191 


RTA00002o67F.j.24.I.PSeq 


F 


M00032829B:E06 


CH08LNH 
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SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


I CLONE ID 


LIBRARY 


377 


428581 


RTA0O002667F.C. 12. 1 .P.Seq 


p 


M00032739A:A06 


CH08LNH 


378 


379598 


RTA0O0O2679F.k.03. l.P.Seq 


F 


M000396973.F1 1 


CH09LNL 


379 


45300 


RTAOO002710FJ.23. l.P.Seq 


p 


M00022434D:D06 


CH03MAH 


380 


23030 


RTA00002 709 F.b. 1 0. 1 . P.Seq 

• 


p 


M00005384A:C1 1 


CHO^COH 


381 


379928 


RTA00002679F.O.06. 1 .P.Seq 


p 


M00039720D:D02 


CH09LNL 


382 


430191 


RTA00002667 F.k.0 1 . 1 .P.Seq 


p 


M000328298:E06 


CH08LNH 


383 


374684 


RTA00002675F.2.02. l.P.Seq 


p 


M00039228A:B05 


CH09LNL 


384 


375728 


RTA0O0O2676F.h.05.2.P.Seq 


p 


M00039299B:G12 


CH09LNL 


385 


230237 


RTA0O0O2670F.b.08.2. P.Seq 


p 


M00033306DH09 


CH09LNL 


386 


380673 


RTA00002673FJ. 13. l.P.Seq 

-* * 


p 


M00039084C:H03 


CH09LNL 


387 


378938 


RTA00002679F.k.20. l.P.Seq 


p 


M00039702A:BI2 


CH09LNL 


388 


375115 


RTA00002673 F.e.0 1. 1 .P.Seq 


p 


M00039066D:G08 


CH09LNI 


389 


378673 


RTA00002680F.D.2 1 .2-P.Seq 


F 


M00039838A:F05 


CH09LNL 


390 


372909 


RTA00002670F.3.12.2.P.Seq 


p 


M00033300DH 12 


CH09LNL 


391 


373300 


RTA00002674F.C.2 1 . 1 .P.Seq 


p 


M00039P6D" A08 


CH09LNL 


392 


379318 


RTA00002683F.h. 16.2.P.Seq 


p 


M0004007 1 B- A 10 


CH091 NJL 


393 


378319 


RTA0000268 1 F.k.07.2.P.Seq 


p ■ 


M000^9890 A HO^ 


CH09LNL 


394 


374608 


RTA0O0O2675F. 3. 20. l.P.Seq 


p 


M000^9*'30A A 10 


CH09LNL 


395 


374328 


RTA00002673F.c.24.2.P.Seq 


p 


M000*906l B F08 


CH09L Nl 


396 


374328 


RTA0O0O2673 F.d.O 1 .2.P.Seq 


p- 


M000'906 1 BFOS 


CH09I NI 


[397 


42840 1 


RTA00002667F.b.07. l.P.Seq 


F 


MOOC^^C F06 


CHOSl NH 


|398 


1 36202 


RTA0O002687F.p.05.2.P.Seq 


p 


M000-10349DB09 


CH I4FDT 


399 


374394 


RTA00002673F.C. 15. l.P.Seq 


"7 


MOOO"90*9C'G08 


CHOQLNL 


400 


37784 


RTA0O0O27O8F.C. 1 7. 1 .P.Seq 


p ■ " 


M0000^816DEI 1 


CHO ICOH 


401 


378282 


RTA0O0O268IF.h. 11. l.P.Seq 


F 


M000*9876DH0^ 


CH09I NL 


402 


185663 


RTA000027l2F.p.l7.2.P.Seq 


F 


M000"7178BG09 


CH04V1 \\ 


403 


14866 


RTA00002709F.d. 14. l.P.Seq 


F "' 


M0000^623DG12 


CH0 n COH 


404 


383 jQ2 


RTA00002670F.k.07.2.P Seq 




M000334a6DB0" > 


CH09LNL 


405 


13463 


RTA00002709F.f. 18. l.P.Seq 


p 


M00006657C:G05 


CH02COH 


406 


21274 


RTA00002709F.m.09.2. P.Seq 


p 


M00007194A:309 


CH02COH 


407 


13745 


RTA000027l4F.b.l3. l.P.Seq 


p 


M0002780IC:C1 1 


CH04MAL 


408 


23485 


RTA000027 14F.C 10. l.P.Seq 


p 


M0002?836D:F12 


CH04MAL 


409 


. 428364 


RTA00002667F.C.09. l.P.Seq 


p 


M0003273~B:E09 


CH08LNH 


410 


431629 


RTA00002669F.1. 14.2.P.Seq 

* 


p 


M00033276B:G08 


CH08LNH 


411 


379754 


RTA00002682F.h.0S. 1 .P.Seq 


p 


M00039983D:A06 


CH09LNL 


412 


431601 


RTA00002669F.k.08.2. P.Seq 


p 


M00033263B:G04 


CH08LNH 


413 


375749 


RTA00002630F.f.23.2. P.Seq 


p 


M0003^795D;G06 


CHOOLNL 


414 


378764 


RTA0000268iF.j.04.2. P.Seq 


p 


M0003°StS4A:Hl I 


CH09LNL 


415 


215605 


RTA00002664F.i.20. 1 .P.Seq 


p 


M000 n "64 r CD0" 


CH04M \\ 


416 


376144 


RTA00002675FJ.09. l.P.Seq 




M0003°24IA:E1I 


CH09LNL 


417 


373071 


RTA000O2670FJ.23.2. P.Seq 




M00033442A:D06 


CH09LNL 


418 


3 79684 


RTA00002631F.C.09.2. P.Seq 




M0003^851B:G1 1 


CH09LNL 


419 


379610 


RTA000026SOF.k.l 1.2. P.Seq 




M000:^815C:F09 


CHO^LNL 


420 


22392 


RTA00002:O8 Fa. 10. l.P.Seq 




M0000!3«55D:K02 


CHOICOH 


421 


377555 


RTA000026S3F.I.08.2. P.Seq 




M000-i0085D:E04 


CH09LNL 


422 


32624 


RT A 0000 2 7 1 3 F.f. 15.1 .P.Seq 




M0002"347C:G07 


CH04MAL 


423 


375024 


RTA000026"5F.p. 1 2. 1 P.Seq 




M0003°266D-FI2 


CH0°LNL 



1? 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 


CLUSTER 


SEO NAME 


ORIENTATION 


CLONE ;d 


LIBRARY 


424 


374725 


RTA00002673F.f.02.l. P.Seq 


F 


M00O39O70D:C02 


CH09LNL 


425 


376228 


RTA00002676F.f. 1 9.2.P.Seq 


F 


M00039293A:H04 


CH09LNL 


426 


375906 


RTA00002675F.L 18.1. P.Seq 


F 


M0003923SD:A08 


CH09LNL 


427 


186190 


RTA000027l4F.a.04.1.P.Seq 


F 


M00027729D:H06 


CH04MAL 


428 


57694 


RTA00002713F.f,02.I.P.Seq 


F 


tM00027319D:Btt 


CH04MAL 


429 


7007 


RTA00002709F.d.08. 1 .P.Seq 


F 


M00005614B:B0i 


CH02COH 


430 


400084 


RTA00002685F.o.l9.2.P.Seq 


F 


M0003964ICD07 


CHI2EDT 


431 


375648 


RTA00002676F.h.I8.2.P.Seq 


F 


M00039301B:F06 


CH09LNL 


432 


166493 


RTA0O002663F.h.08.1. P.Seq 


F 


M00022492C:A02 


CH03MAH 


433 


379632 


RTA00002682F.h.l4.I.P.Seq 


F 


M00039984B:G12 


CH09LNL 


434 


373234 


RTA00002676F.g. 1 5.2.P.Seq 


F 


M00O39297C:H08 


CH09LNL 


435 


401230 


RTA0O002685F.i.05.2.P.Seq 


F 


M00039533A:CI2 


CH12EDT 


436 


186623 


RTA000027l2F.f. 15.1. P.Seq 


F 


M00026843B:D10 


CH04MAL 


437 


127714 


RTA000027l2F.k. 14.1. P.Seq 


F 


M00027018A:C09 


CH04MAL 


438 


451857 


RTA00002692F.a.01.!.P.Seq 


F 


M00042584B:CI0 


CHI SCON 


439 


404620 


RTA00002687F.c.03.2.P.Seq 


F 


M00039770A:G1I 


CH14EDT 


440 


186872 


RTA00002663F.k.23.l. P.Seq 


F 


M00022797B:G08 


CH03MAH 


441 


42729 


RTA00002709F.C.06.:. P.Seq 


F 


M00O05458A:FI 1 


CH02COH 


442 


373380 


RTA00002674F.b.07. 1 .P.Seq 


F 


M000J9I23A:BI0 


CH09LNL 


443 


374465 


RTA00002673F.C.07. 1 .P.Seq 


F 


M00039058C:H02 


CH09LNL 


444 


403557 


RTA00002637F.d.l0.2.P.Seq 


F 


M00039948A:E03 


CHI4EDT 


445 


16749 


RTA00002709F.b.l4.2.P.Seq 


F 


M000054023:F08 


CH02COH 


446 


375592 


RTA00002680F.f.22.2.P.Seq 


F 


M00039795D:E10 


CH09LNL 


447 


376103 


RTA00002676F.g.06.2.P.Seq 


F 


M000J92953:D03 


CHOSLNL 


448 


40228 


RTA00002712F.1. 18.1. P.Seq 


F 


M00027049B:F05 


CH04MAL 


449 


374606 


RTA00002673F.J.23.I. P.Seq 


F 


M0003W6A:A05 


CH09LNL 


450 


378270 


RTA00002680F.h.08.2. P.Seq 


F 


MOOO.^SOIA.-HH 


CH09LNL 


451 


236321 


RTA00002668F.k. 14. 1 .P.Seq 


F 


M0003J034C:F02 


CH08LNH 


452 


378676 


RTA00002680F.m.20.2. P.Seq 


F 


M00039S27B:F07 


CH09LNL 


453 


373252 


RTA00002670F.k.l6.2.P.Seq 


F 


M00033451A:H01 


CH09LNL 


454 


384601 


RTA00002670F.k.06.2. P.Seq 


F 


M00033446CG08 


CH09LNL 


455 


403772 


RTA00002687F.a.03.2.P.Seq 


F 


M00039746C:G09 


CHI4EDT 


456 


379566 


RTA000026S3F.k.04.l. P.Seq 


F 


M000400S1C:E01 


CH09LNL 


457 


136202 


RTA0000268"F.p.05. 1 P.Seq 


F 


M000-10349D:B09 


CH14EDT 


458 


143 17 


RTA000027I3F.C. 13.1. P.Seq 


F 


M0O027248A:C02 


CH04MAL 


459 


375349 


RTA00002672F.j.I 1.1. P.Seq 


F 


M0O0J^O:4B:BI0 


CH09LNL 


460 


403020 


RTA00002687F.a.02.2. P.Seq 


F 


M0003*T46C:A08 


CHI4EDT 


461 


* 374060 


RTA00002672F.L07.1. P.Seq 


F 


M0003 l >014B:C04 


CH09LNL 


462 


183399 


RTA00002712F.0. 10.!. P.Seq 


F 


M000:?136C:C09 


CH04MAL 


463 


373789 


RTA000O267IF.C.2O.!. P.Seq 


F 


M0O03S259B:GO8 


CH09LNL 


464 


20168 


RTA0O0O271IF.b.22.1. P.Seq 


F 


M000::S34B:G1 I 


CH03MAH 


465 


452641 


RTA0O0O2692F.d.O5.:. P.Seq 


F 


MO0043003C:DO8 


CHI SCON 


466 


431370 


RTA00002669F.m.O4.2. P.Seq 


F 


M000332SSB:DI2 


CHOSLNH 


467 


153044 


RTA00002713F.j.03.i. P.Seq 


F 


M0O0:"4"6A:CO9 


CH04MAL 


468 


378229 


RTA0O002679F.c.!6.:.P.Seq 


F 


M000;°6o3C:G09 


CH09LNL 


469 


374328 


RTA00002673F.d.01.1.P.Seq 


F 


M000;°O61B:FO8 


CH09LNL 


470 


39606 


RTAOO002713F.i.2O.i. P.Seq 


F 


MO00:"4oSA:C09 


CH04MAL 



1% 



WO 01/02568 



PCT/US00/18374 



472 



476 



485 



486 



487 



496 



497 



498 



499 



505 



509 



510 



511 



515 



516 



CLUSTER 
59077 



1935 



SE Q NAME 
RTAOOo'oiTljF.n.Ol.l.P.Seq 



379684 
451564 
7571 



129323 
12960 
136730 
59077 
85884 



RTA0000271QF.b.l 1. 1. P.Seq 
RTA0000268IF.e.09,l,P.Seq 
RTA0000269 IF.f. 1 2.2.P.Seq 
RTA0000271QF.a.l5.I.P.Seq 



RTAQ0002713F.k.2I.I.P.Seq 
RTA00002710F.a.23.1. P.Seq 
RTA000027l3F.o.05.1.P.$eq 
RTA00002713F.m.24. 1 .P.Seq 
RTA00002712F.b.Q6. 1 .P.Seq 



ORIENTATION 
F 



19471 
45206 



RTA00002703F.g.08.1. P.Seq 
RTA00002710F.C.06.1. P.Seq 



404257 I RTA00002687F.g.06.2.P.Seq 
372997 1 RTA00002679F.p.04.1,p.Se7 



CLONE ID 
M00027596CE06 



M00003006B:B03 
M00039851B:G1 1 
tM000434| ID:H06 



M00007943D:C09 



M00027525B:D06 
M00007976A:C10 
M0002764IC:A03 



M00027596C:E06 
M000233I6C:G08 



M00004I97B:HI0 



M00OO8063B:A06 



43792 I RTA00002713F.k.l6.I.P.Seq 
400052 RTA0000:687F.h.l3.2.P.Seq" 
452194 1 RTA00Q02692F.cJ4.2.pTSeq' 



488 24034 I RTA00002710F.b.06.1. P.Seq 



439 447544 | RTA00002689F.e. 1 3. 1 .P.Seq 



490 401372 I RTA000026S6F.C.23. 1 .P.Seq 



376553 RTA00002674F.g.l9.2.P.Seq 



45505 
16760 
374174 
374233 



375772 
376417 



RTAOO0O:694F.a.O7. 1 . P.Seq 
RTAOO0O27Q8F.j.03.I.P.Seq 
RTA00002672F.i.l2.2.P.Seq 
RTA00002672F.k.2l.2.P. Seq 



RTA0000263IF.O.24. 1 .P.Seq 



RTA00002673F.i.03.2. P.Seq 



42897 1 RTA00002666F.O.02. 1 P.Sea 



MO0Q4O2O8A:C03 
M00059729A:AI0 



M00027520A:C05 



M00040291D;C05 
M00042988A:F06 



LIBRARY 
CH04MAL 



CH03MAH 



CH09LNL 
CH I 7C0HLV 



CH03MAH 



CH04MAL 
CH03MAH 
CH04MAL 



CH04MAL 
CH04MAL 



CH01COH 



CH03MAH 



CH14EDT 
CH09LNL 



CH04MAL 



CH14EDT 
CH13C0N 



M00007992CF06 CH03MAH 



iM00042905D:DQ2 
M00040141D;F05 
M00039139A;C09 



CH15C0N 
CHI3EDT 



CH09LNL 



M0004259 5A.A 1 1 | CH20COHL V 



M00004393 B:E07 I CHOlCOH 



M0QQ390I5 A:D07 
M00039030B:E02 



CH09LNL 
CH09LNL 



M00039909C:G05 | CH09LNL 



M00039477D;A10 | CH09LNL 



394098 I RTA00002631F.j.l5.I.P.Seq 



379761 
374266 



RTA00002670F.n.Q3. 1 .P.Seq 
RTA00002674F.i.03.1. P.Seq 



372946 
223909 
427524 



RTA00002670F.L07.1. P.Seq 
RTA00002664F.e.03. 1 .P.Seq 
RTA00002665F.e.05. 1 .P.Seq 



M00039837C:E07 
M00033561C:A02 



CH09LNL 
CH09LNL 



330413 RTA00002630F.k.l9.2.P.Seq 



373366 
427202 
375000 



378833 



RTA00Q0:67 1 F.c.24,2, P.Seq 
RTA00002665F.g. 1 5. 1 .P.Seq 
RTA0000267QF.j.l3.I.P.Seq 



24945 I RTA00002710F.p.05.1, P.Seq 



20277 RTA0000271QF.e.l7. LP.Seq 



20320 
37679 1 
9809 



RTA00002710F.e.02.1. P.Seq 
RTA000026 74F.1. 1 7.2.P.Seq 
RTA000027lQF.g.l2.I.P.Seq 



429562 RTA00002667F.m.03. 1 . P.Seq 



2920 RTA00002710F.e. 15.1. P.Seq 



377565 | RTA00002684F.h. 1 9. 1 P.Seq 



M00039144C:E06 | CH09LNL 



M00033457D;A05 | CH09LNL 
M00027035C:E1 1 
M00023354D:A03 



CH04MAL 



CH03LNH 



M000393I6C:D05 1 CH09LNL 



M00033259C:H09 | CH09LNL 



M000286|7C:AI2 [ CHOSLNH 



M0003 345 7CC03 I CH09LNL 



M0003963 7C:AI0 I CH09LNL 



MOOO:273^A:B03 I CH03MAH 



M0002 1972D:C1I I CH03MAH 



M0002 1 9 1^C:AI0 I CH03VIAH 



M00039|66B:G06 



M00022I78B:D06 



M000;2S53D:GI2 



M00021964C:E10 



MOOO-i0309A:El I 



CH09LNL 



CH03MAH 



CHOSLNH 



CH03MAH 



CH09LNL 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



518 



519 



520 



521 



522 



523 



524 



525 



526 



527 



523 



529 



530 



531 



532 



^ -» -» 
j j j 



534 



535 



CLUSTER 



429356 



427634 



427713 



373607 



378781 



SEQ NAME 



ORJ EN 



RTA00OO2668F.d.23. 1 .P.Seq 



RTA00002665F.f.09. 1 .P.Seq 



RTA00002665F.e.23.I.P.Seq 



RTA00002674F.d.l5.2.P.Seq 



RTA00002674F.0. 1 4. 1 .P.Seq 



429361 



126754 



428047 



1886: 



379761 



46407 



21365 



427466 



400265 



(80056 



375324 



25165 



401296 



RTA00002666F.d.l !. I. P.Seq 



RTA00002663F.a.l6.I.P.Seq 



RTA00002665 F.k. 1 0. 1 .P.Seq 



RTA00002709F.d. 1 5. 1 .P.Seq 



RTA000G2670F.n.03.2.P.Seq 



RTA00002665F.C. 10. 1. P.Seq 



RTA00002709F.k.06. 1 .P.Seq 



RTA00002665F.b.I 1.1. P.Seq 



RTA00002635F.c.03.2.P.Seq 



RTA00002630F.a: 16.2. P.Seq 



RTA00002678F.1. 12.2. P.Seq 



RTA00002710F.k.l7.l. P.Seq 



RTA00002685F.h.23.2.P.Seq 



ATION 



CLONE ID 



LIBRARY 



M00032933A:CIO CH08LNH 



M00028369D:E08 CH08LNH 



M00028364C:G08 



M00032550D:C02 



M0003I417CG09 



M0OO3356IC:A02 



M0002SI96D:A03 



M00039374B:B07 



M00039773D:F 



M000396I23BIO 



M00039529C:D07 



CH08LNH 



M00039127D:EIO CH09LNL 



M000391963:H06 CH09LNL 



CH08LNH 



M00008045A:H02 CH03MAH 



CH08LNH 



M00005625A:C02 CH02COH 



CH09LNL 



CH08LNH 



M000070I2D:H08 CH02COH 



M0002SI84D:GI0 CH08LNH 



CH12EDT 
CH09LNL 



CH09LNL 



M000224963 E12 CH03MAH 



CHI2EDT 



536 



537 



538 



539 



540 



541 



! 94098 



RTA000026SlF.j.!5.2.P.Seq 



17430 



RTA00002710F.LI 1.1 .P.Seq 



373820 



RTA00002674F.d.06.1. P.Seq 



378548 



RTA00002672F.g.l4.2.P.Seq 



222679 



76S74 



RTA00002664FT. 1 8.2.P.Seq 



RTA00002670F.e.23.2.P.Seq 



M0003988~C:E07 



M00039127A:GI I 



M0003 900^3:0 



MO0OZ7228D:A0 



M00033375A:G04 



CH09LNL 



M00022365D:A03 CH03MAH 



CH09LNL 



CH09LNL 



CH04MAL 



CH09LNL 



542 



543 



544 



545 



546 



547 



548 



21329 



RTA00002709F.b.Q8. 1 .P.Seq 



1 19905 



RTA00002710F,p.l3.l .P.Seq 



377028 



RTA00002678F.n.2 1 .2. P.Seq 



373351 



RTA0000267IF.l.lS.5.P.Seq 



376082 



RTA00002674F.m. 1 7. 1 .P.Seq 



7698' 



RTA00002678F.2.2 1.2.P.Seq 



6I92I 



RTA0000266 1 F.g.OS. 1 .P.Seq 



M00005379A:E04 



M00039631AC10 



MO0O3S327D:AO5 



M00039I71 3:D I I 



M00039472C:B08 



M000039953:E03 



CH02COH 



M0002278:C:G06 CH03MAH 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CHOICOH 



549 



550 



551 



552 



55 



554 



556 



557 



558 



559 
560 



56 



562 
563 



564 



373486 



RTA00002672F.b.03.2. P.Seq 



380355 



RTA00002670F.o.06.2.P.Seq 



430295 



RTA00002667F.h.l4. 1 .P.Seq 



379221 



RTA00002682F.n.0 1.1. P.Seq 



373532 



RTA00002672F.d. 1 0.2. P.Seq 



375633 



RTA00002677F.m.05.2.P.Seq 



378356 



RTA00002681F.f.07.i. P.Seq 



76196 



RTA00O02674F.m.I2.I.P.Seq 



3751 15 



RTA00002673F.d.24.2. P.Seq 



3751 15 



RTA00002673F,e.01.2.P.Seq 



378600 



RTA00002679F.i.03.1. P.Seq 



375351 



RTA00002680F.e. 15.1. P.Seq 



25237 



RTA000027I0F.n.23.I.P.Seq 



19350; 



RTA00002663F.n.l5.I.P.Seq 



428268 



.RTA00002667F.6.0U. P.Seq 



379440 



RTA00002633F.j.2|.2.P.Seq 



MO0O38635B:C08 CH09LNL 



MO0033570C:CI0 



MO00328O8B:GIO 



M000400I7D:GO: 



M0003399I A; DO 



M000394I7B:F0I 



M00039366B:A03 



M00039170C:F05 



M00039066D:G08 



M00039686C:E06 



CH09LNL 



CH08LNH 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 
CH09LNL 
CH09LNL 



M00039066D:G08 CH09LNL 



CH09LNL 



M0O039792A.304 CH09LNL 



M0002267IB:A08 CH03MAH 



M00023039D:305 CH03MAH 



M0O032724A.CO5 CHOSLNH 



M00040030C:C06 



CH09LNL 



^0 



WO 01/02568 



PCT/US00/18374 



SEQ 












| ID 


CLUSTER 


SEQ NAME 


| ORIENTATION 


CLONE ID 


LIBRARY 


565 


374502 


RTA00002673F.i.03.2.P.Seq 


F 


M00039080C:H06 


CH09LNL 


566 


2406 1 5 


RTA00002672F.e.I9.2.P.Seq 


F 


M00033995D:E05 


CH09LNL 


567 


379207 


RTA00002670F.b.07.2.P.Seq 


F 


M00033306D:G08 


CH09LNL 


563 


427893 


RTA00002665F.k. 19. LP.Seq 


F J 


M00031419D:C04 


CH08LNH 


569 


377530 


RTA00002684F.g. 1 9.2.P.Seq 


F 


M00040305A:D1 1 


CH09LNL 


570 


429707 


RTA00002668F.C.I l.l.P.Seq 


F 


M00032918C:B10 


CH03LNH 


571 


427610 


RTA00002665F.i.04.|.p.Seq | 


F 


M00028770A:D04 


CH08LNH 


572 


100699 


RTA00002662F.b.22.2.P.Seq 


F 


M00006680B:D02 


CH02COH 


573 


378974 


RTA00002682F.m.2 1 . LP.Seq 


F 


M00040017A:C06 


CH09LNL 


574 


373607 


RTA00002674F.d. 15.1 .P.Seq 


F 


M00039I27D:E10 


CH09LNL 


575 


26295 1 


RTA00002665F.d.04.3.P.Seq 


F 


M00023215D:F03 


CHOSLNH 


576 


30748 


RTA000027l3F.e.l l.l.P.Seq 


F 


M0002730IB:B08 


CH04MAL 


577 


161 1 16 


RTA00002714F.C.1 l.l.P.Seq 


F 


M00027837C:D09 


CH04MAL 


578 


37921 1 


RTA00002682F.p.20. 1 P.Seq 


F 


M00040029A:G04 


CH09LNL 


579 


430689 


RTA00002669F.L24. 1 .P.Seq 


F 


M00033243B:A05 


CHOSLNH 


580 


374122 


RTA00002673F.1.22.2.P.Seq 


F 


M00039I04D:C09 


CH09LNL 


581 


376521 


RTA00002677F.h.06.2.P.Seq 


F 


M0003939SA:BI0 


CH09LNL 


582 


372334 


RTA00002670F.b.l2.2.P.Seq 


F 


M00033303B:G05 


CH09LNL 


583 


379014 


RTA00002682F.O.02. LP.Seq 


F 


M00040022C:D06 


CH09LNL 


584 


376344 


RTA00002677F.b.l8.2.P.Seq 


F 


M00039340B:G08 


CH09LNL 


585 


376485 


RTA00002676F.f.0 1 .2.P.Seq 


F 


M0003923SCBI 1 


CH09LNL 


586 


21661 


RTA00002709F.e.I8.I.P.Seq 


F 


M00005820C:E04 


CH02COH 


587 


376539 


RTA0OO02675F.b. IS. LP.Seq 


F 


M0003921 IA:C12 


CH09LNL 


588 


43 1 645 


RTA00002669F.h. 1 5.3. P.Seq 


F 


M00033223B:H07 


CHOSLNH 


589 


163293 


RTA00002714F.C.20.I. P.Seq 


F 


M0002SI20D.F12 


CH04MAL 


590 


178614 . 


RTA000027 1 3 F.c.20. 1 .P.Seq 


F 


M00027263A;F10 


CH04MAL 


591 


373274 


RTA00002670F.i.22.2.P.Seq 


F 


M00033432B:HI0 


CH09LNL 


1 592 


379820 


RTA00002679F. f. 1 5. LP.Seq 


F 


M00039677A:B08 


CH09LNL 


593 


1 160536 


RTA0000266 jhh 10. LP.SeqT 




MOUO.'JJJJC.AIJ 


CH03MAH 


594 


373313 


RTA0000267 I F.m.02. 1 .P.Seq 


F 


M00038328D:A03 


CH09LNL 


595 


26429 


RTA000027l2F.k.23. LP.Seq 


F 


M00027022D:G1 1 


CH04MAL 


I 596 


17983 


RTA000027 1 IF.f. 10.1. P.Seq 


F 


M00022979A.D05 


CH03MAH 


597 


375338 


RTA0000268lF.j.22.2.P.Seq 


F 


M00039888B:D03 


CH09LNL 


598 


63005 


RTA00002712F.m.2 l.l.P.Seq 


F 


M00027094A;B03 


CH04MAL 


599 


23030 


RTA00002709F.b. i 0.2.P.Seq 


F 


M00005334A:C1 I 


CH02COH 


600 


372946 


RTA00002670F.L07.2. P.Seq 


F 


M00033457D:A05 


CH09LNL 


601 


375351 


RTA00002680F.e. 15.2. P.Seq 


F 


M00039792A:B04 


CH09LNL 


602 


374502 


RTA00002673F.i.0S.i. P.Seq 


F 


M00039080CH06 


CH09LNL 


603 


37691 1 


RTA00002682F.e.09. LP.Seq 


F 


M00039938C:A0S 


CH09LNL 


604 


376024 


RTA00002675F.n.I5.I.P.Seq 


F 1 


M00039257D:C03 


CH09LNL 


605 


377194 


RTA00002679F.h. 20. LP.Seq 


F 


M000396S5A:A08 


CH09LNL 


606 


379643 


RTA000026S2F.g.OS. LP.Seq 


F J 


M00039978A:G05 


CH09LNL 


607 


379610 


RTA00002680F.k.l l.l.P.Seq 


F 


M00039S15C:F09 


CH09LNL 


608 


25613 


RTA0000271 I F.g.06. LP.Seq 


F 


M00023024D:FI2 


CH03.VIAH 


609 


207466 


RTA0O002664F.j.0S. LP.Seq 


F 


M00027733A:A02 


CH04MAL 1 


610 


400052 


RTA0O0026S7F.h. 13. 1 .P.Seq | 


F 


M00040291D:C05 


CHI4EDT 


611 


21290 


RTA00002712F.g.O l.l.P.Seq | 


F 1 


M00026S59D:D01 


CH04MAL 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



612 



613 
614 



615 



CLUSTER 



SEQ NAME 



ORIENTATION 



CLONE ID 



375975 



46804 
69863 



375285 



RTA00002675F.n. 18. l.P.Seq 



RTA00G02712F.n.l9.I.P.Seq 



RTA000027l2F.U3.I.P.Seq 



RTA00002676F.g.l8.2.P.Seq 



M00039258D:B08 



M00027I21D:C05 



M00026935C:B04 



M00039298B:B06 



LIBRARY 



CH09LNL 



CH04MAL 



CH04MAL 



CH09LNL 



616 



373000 



RTA00002670Fj.l3.2.P.Seq 



M00033437C:C03 



CH09LNL 



617 



618 



619 



378679 



RTA0000268IF.f. 16. l.P.Seq 



M00039869B:F06 



CH09LNL 



45407 



RTA00002712F.k.l l.l.P.Seq 



M00027016A:B06 



CH04MAL 



16838 



RTA000027l2F.e.23.1.P.Seq 



M00026803A:F08 



CH04MAL 



620 



86425 



RTA 000027 13 F.c.04. 1 .P.Seq 



M00027236A:E04 



CH04MAL 



621 



j / 



6485 



RTA00002676F.e.24.2.P.Seq 



M0003928SC:BII 



CH09LNL 



622 



41 108 



RTA00002712F.n.l2.I.P.Seq 



M00027108C:B03 



CH04MAL 



623 



430876 



RTA 00002669F.C.02. l.P.Seq 



M00033I36C:DtI 



CH08LNH 



624 



625 



1857(6 



RTA000027!3F.1.07.1.P.Seq 



MOOO27537CB01 



CH04MAL 



85338 



RTA000027l2F.b.l8.1.P.Seq 



M00023353D:C12 



CH04MAL 



626 



185597 



RTA000027l3F.m.23.I.P.Seq 



M00027596A:AI0 



CH04MAL 



627 



139348 



RTA000027l3F.k.23. l.P.Seq 



M00027526D:FO; 



CH04MAL 



623 



454665 



RTA00002693F.d. 1 5.2.P.Seq 



M00043I6JCEI2 



CH19COP 



629 



186387 



RTA000027l3F.I.OI.I.P.Seq 



M00O2752SCBI0 



CH04MAL 



630 



186387 



RTA000027l3F.k.24.I.P.Seq 



M00027528C:BIO 



CH04MAL 



631 



2109; 



RTA00002708F.h.20. 1 .P.Seq 



M0000430SCC06 



CH01COH 



632 



20S27 



RTA00002710F.c.23.I.P.Seq 



M00021671D:FI 



CH03MAH 



63; 



21290 



RTA00002712F.f.24. l.P.Seq 



M000268f3D:D0l 



CH04MAL 



634 



17646 



RTA00002710F.d.22.I.P.Seq 



M0002190SD:GI2 



CH03MAH 



635 



402817 



RTAO0002686F.a.l0.1.P.Seq 



M00039736D:G08 



CH13EDT 



636 



42854 



RTA00002713F.n.09.1. P.Seq 



M00027615A:FI0 



CH04MAL 



637 



430876 



RTAO0O02669F.c.02.3.P.Seq 



M00033IS6CD1 



CH08LNH 



633 



78641 



RTA00002679F.a.2 1 .2.P.Seq 



M0003965:C:EOS 



CH09LNL 



639 



375843 



RTA00002674F.m.03.2.P.Seq 



M00039I63C:A04 



CH09LNL 



640 



36165 



RTAQ000270SF.L06. l.P.Seq 



M000043-iOC:C07 



CHOICOH 



641 



456506 



RTA00002694F.d.05. 1 .P.Seq 



M000434O2A:E0l 



CH20COHLV 



642 



:74450 



RTA000026~2F.i.05.2.P.Seq 



M0OO39OUA:H10 



CH09LNL 



643 



644 



378949 



RTA00002633F.o.2l.2.P.Seq 



M00040100D:B06 



CH09LNL 



^ «T ^ • 

j7jj 1 j 



RTA0000267 1 F.m.02.2.P.Seq 



M0003832SD:A03 



CH09LNL 



645 



377861 



RTA 0000268 1 F.m.20. 1. P.Seq 



M00039898A:A08 



CH09LNL 



646 



431 196 



RTA 00002669F.f.07.2. P.Seq 



M0005320-B:A07 



CH08LNH 



647 



2795 



RTA00002683F.a.06. l.P.Seq 



M00040032A:B0: 



CH09LNL 



648 



42340 



RTA0000266lF.b.O3. l.P.Seq 



M0000I43*?C:H06 



CHOICOH 



649 



374410 



RTA00002674F.U 1.1. P.Seq 



M0003915SB:GI2 



CH09LNL 



650 



374623 



RTA00002674F.a.0 1 .2. P.Seq 



M00039I I3D:A06 



CH09LNL 



651 



431612 



RTA00002669F.e.23.2. P.Seq 



M0003320:D:G06 



CH08LNH 



652 



240615 



RTA00002672F.e. 19. l.P.Seq 



M000389OfD:E05 



CH09LNL 



653 



423508 



RTA 00002666F.d.0 1. l.P.Seq 



MO0O32545B:H09 



CH08LNH 



654 



235780 



RTA00002666F.d.03. 1 .P.Seq 



M0003254fD:G05 



CH03LNH 



655 



1*890 



RTA000027IOF.e.l l.l.P.Seq 



M000219??A:H02 



CH05MAH 



656 



20100 



RTA000027|OF.g. I l.l.P.Seq 



M00022riD:Dl2 



CH03MAH 



657 



658 



4458 



RTA00002710F.2.IS.I.P.Seq 



M000221S-CC1 1 



CH03MAH 



373347 



RTA0000268iF.h.07.2. P.Seq 



M000398"?D:AI0 



CH09LNL 



WO 01/02568 



PCT/US00/18374 



660 



661 



662 



663 



664 



665 



666 



667 



668 



669 



670 



671 



672 



673 



674 
675 



676 



67' 



678 



679 



680 



682 



683 



684 



685 



686 



688 



689 



690 



692 



693 



694 



695 



696 



697 



698 



699 



700 



701 



702 



703 



704 



705 



CLUSTER 
373477 



15596 



21028 



37406; 



380686 



402950 



428064 



23310 



376233 



375848 



24225 I 



74064 



146260 



375575 



355518 



184223 



213306 



429566 



378656 



427760 



372795 



SEQ NAME 
RTA00002672F.b.23. 1 .P.Seq 



RTA000027IOF.g.02.1.P.Seq 



RTA00002709F. 1.09.1 .P.Seq 



RTA00002672F.h.l5.2.P.Seq 



RTA00002634F.a.03.2.P.Seq 



RTA00002686F.2.1 1. 1. P.Seq 



RTA00002665F.1.04. 1 .P.Seq 



RTA00002708F.e. 1 0. 1 .P.Seq 



RTA00002677F.b. 1 5.2.P.Seq 



RTA00002674F.m.03.1. P.Seq 



RTA00002665F.LOS. 1. P.Seq 



RTA00002672F.f. 1 5. 2. P.Seq 



RTA00002663F.d. 17.1. P.Seq 



RTA00002677F.e.22.2.P.Seq 



RTA00002665F.c.l5.3.P.Seq 



RTA00002662F.b.03.2.P.Seq 



RTA00002664F.e.07.2.P.Seq 



RTA00002668F.b.04.1 .P.Seq 



RTA0OO0:6S2F.c.09.1. P.Seq 



RTA00002668F.e.23.1. P.Seq 



RTA00002683F.a.06.2.P.Seq 



429340 



429822 



375224 



378347 



380109 



379001 



375348 



42988; 



377067 



378001 



45298 



375431 



377861 



428610 



20765 



2760 



430540 



381024 



16454 



372898 



373681 



82260 



377343 



374747 



185848 



RTA00002666F.f.l2.1. P.Seq 



RTA00002668F.e. 17.1. P.Seq 



RTA0000268OF.d.22.2.P.Seq 



RTA0000268 1 F.h.07.1 .P.Seq 



RTA00002682F.U7.1. P.Seq 



RTA000026S3F.O.02. 1 .P.Seq 



RTA00002676F.L 12.5. P.Seq 



RTA00002667F.g.05. 1 .P.Seq 



RTA00002682F.1.24.1. P.Seq 



RTA00002681 F.m.22.2.P.Seq 



RTA000027lOF.j.21.1.P.Seq 



RTA00002630F.f.Q3. 1 .P.Seq 



RTA00Q0268 1 F.m.20.2. P.Seq 



RTA0000:667F.e.Q9. 1 .P.Seq 



RTA000027lOF.i.lO.I.P.Seq 



RTA00OO27l3F.e.23.1.P.Seq 



RTA00002663F.o.20.2.P.Seq 



RTA00002670F.h.23.2. P.Seq 



RTA00002709F.f.07.I.P.Seq 



RTA00002670F.i.03.2.P.Seq 



RTA00002671F.d.20.I.P.Seq 



RTA00002634F.h.06.2.PSeq 



RTA00002684F.g.04.2.P.Seq 



JlTA00002676F.e.Q7.2.P.Seq 



RTA00002"12F.m.l l.l.P.Sea 



ORIENTATION 
F 



CLONE ID 
M00058639B:C03 



M00022I35CB05 



M00007108B.A02 



M000390I 1D:C10 



M00040I07B:H07 



M00040181B:H09 



M0003I485D:G02 



M00004046C:A08 



M00039339C.F03 



M00039I68C.A04 



M00028772C.B09 



M00038999D:CI 1 



M00022099B:D06 



M00039385B:E09 



M0002S20I3.HI2 



M00005539D:G0I 



M00027078A:B02 



M00032907A:G04 



M00039927A:F04 



M00052940A:C02 



M00040032A:B03 



M00032577A:CO4 



M00032939B:E07 



M00039783B:A06 



M00O39875D:AIO 



M0003998"C:G08 



M000400^7A:C12 



M0O0i93O4D:B09 



MO0O32793A.F06 



M000400U3:D01 



M000398^8D:C06 



M00O22455A:E02 



M00039793D:C05 



M00059898A:A08 



M00052.766C:A04 



M00022363C:G12 



M000273 UC:D09 



MO0O53UOD:F06 



M00OJ3424B:AO4 



LIBRARY 
CH09LNL 



CH03MAH 



CH02COH 



CH09LNL 



CH09LNL 



CH15EDT 



CH08LNH 



CH01COH 



CH09LNL 



CH09LNL 



CHOSLNH 



CH09LNL 



CH05MAH 



CH09LNL 



CHOSLNH 



CH02COH 



CH04MAL 



CHOSLNH 



CH09LNL 



CHOSLNH 



CH09LNL 



CHOSLNH 



CHOSLNH 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CHOSLNH 



CH09LNL 



CH09LNL 



CH03MAH 



CH09LNL 



CH09LNL 



CHOSLNH 



CH03MAH 



CH04MAL 



CHOSLNH 



CH09LNL 



M000065^9D:B02 



M00053424D:H12 



M0005S272D:F1 1 



MO004O3O-B:F0l 



M00040302C:A04 



M000392S6A;C06 



mooo::oso.a:boi 



CH02COH 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CH04MAL 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


706 


374311 


RTA00002676F.e.l8.2.P.Seq 


F 


M0O03923"C:A06 


CH09LNL 


707 


278923 


RTA00002667F.b.lO.I.P.Seq 


F 


M00032726C;COI 


CH08LNH 


708 


378667 


RTA0000268IF.b.l l.2.P.Seq 


F 


M00039847A:F06 


CH09LNL 


709 


380454 


RTA00002673F.j. 16. l.P.Seq 


F 


M0003908JD:D07 


CH09LNL 


710 


381576 


RTA00002670F.i.04.2.P.Seq 


F 


M00033425A:CI0 


CH09LNL 


711 


375067 


RTA00002675F.O.03. l.P.Seq 


F 


tM00039260C:G03 


CH09LNL 


712 


89706 


RTA0O0O27l4F.a. 11. l.P.Seq 


F 


M000277413:F09 


CH04MAL 


713 


10583 


RTA000027I IF.h.l 1. l.P.Seq 


F 


M00023100A:EI2 


CH03MAH 


714 


379982 


RTA00002632F.i. 16. l.P.Seq 


F 


M00039987C:E12 


CH09LNL 


715 


378532 


RTA00002680F.n.04.3.P.Seq 


F 


M00039828B:C05 


CH09LNL 


716 


379776 


RTA0000^680F.a.22.2.P.Seq 


F 


M00039774C:A03 


CH09LNL 


717 


374136 


RTA00002673F.f. 16. l.P.Seq 


F 


M00039072C:C03 


CH09LNL 


718 


98471 


RTA00002663F.J.2L l.P.Seq 


F 


M00022670D:HI 1 


CH03MAH 


719 


125365 


RTA0000266SFJ.07. l.P.Seq 


F 


M00033019B:E10 


CH08LNH 


720 


375431 


RTA0O0O268OF.f.O3.2.P.Seq 


F 


M00039793D:C05 


CH09LNL 


721 


62826 


RTA0O0O2661F.g.2O. l.P.Seq 


F 


M00004105D:D05 


CH01COH 


722 


379972 


RTA00002679F.e. 10. l.P.Seq 


F 


M00039672D:DIO 


CH09LNL 


723 


377554 


RTA00002679F.f. 10. l.P.Seq | F 


M00039675D:B03 


CH09LNL 


724 


230479 


RTA00002664F.c.I6.2.P.Seq 


F 


M000269I5B:C06 


CH04MAL 


725 


98872 


RTA00002663F.j. 19. l.P.Seq 


F 


M00O22668B:BI2 


CH03MAH 


726 


42635 . 


RTA00002679F.h. 18. l.P.Seq 


F 


M00039634D:B08 


CH09LNL 


727 


379044 


RTA00002679F.a.l0.2.P.Seq 


F 


M00039652B:D05 


CH09LNL 


728 


96093 


RTA0O002663F.j.07. l.P.Seq 


F 


M00022640C:C12 


CH03MAH 


729 


403642 


RTA00002687F.d.01.2.P.Seq 


F 


M00039945C:F09 


CH14EDT 


730 


40092 1 


RTA00002685F.b.I8.2.P.Seq 


F 


M0003937IB:H06 


CH12EDT 


731 


93587 


RTA00002663F.k. 10. l.P.Seq 


F 


M00022731A:D02 


CH03MAH 


732 


7995 1 


RTA00002713F.C. 18. l.P.Seq 


F 


M00027258A:A07 


CH04MAL 


/ J J 


176509 


RTA00002686F.b.09. 1 .P.Seq 


F 


M000397563:HO6 


CHI3EDT 


734 


451753 


RTA00002694F.e.06. 1 P.Seq 


F 


M00043634A:CIO 


CH20COHLV 


735 


1 86266 


RTA000027I3F.C. 16. l.P.Seq 


F 


M00027256B:H09 


CH04MAL 


736 


235052 


RTA00002692F.a.l5.2.P.Sea 


F 


M00042626B:D08 


CHI SCON 


737 


377233 


RTA00002682F.e.23. l.P.Seq 


F 


M00039*)40D:G08 


CH09LNL 


738 


378532 


RTA00002680F.n.04.2. P.Seq 


F 


M0O039828B:C05 


CH09LNL 


739 


177932 


RTA000027 1 3 F.b.22. l.P.Seq 


F 


M00027233B:C0l 


CH04MAL 


740 


9332 


RTA000027l2F.p. 18. l.P.Seq 


F 


M00027I79D:E06 


CH04MAL 


741 


2403 1 8 


RTA00002687F.d.04.2.P.Seq 


F 


M00039947A:D06 


CHI4EDT 


742 


404260 


RTA00002687F.C.1 1.2. P.Seq 

* 


F 


M00039942D:C01 


CHI4EDT 


743 


* 93767 


RTA000027 1 2F.g.09. 1 .P.Seq j F 


M0002686SC:E1 1 


CH04MAL 


744 


1 85642 


RTA000027 l2F.f.20. 1 .P.Seq 


F 


M00026S 56 D:F02 


CH04MAL 


745 


447544 


RTA00002689F.e.l8.3.P.Seq 


F 


M00042905D:D02 


CH15CON 


746 


403274 


RTA00002687F.b.l0.2.P.Seq 


F 


MQ003 l )766A:G07 


CH14EDT 


747 


404257 


RTA00002687F.2.06. 1 .P.Seq 


F 


MO0040208A:CO3 


CHI4EDT 


748 


403868 


RTA00002687F.k.05.2.P.Seq 


F 


M00040318C:H1 1 


CHUfeDT 


749 


450074 


RTA0000269IF.e.l2.2.P.Seq 


F 


M000433^2D:CI 1 


CHI 7COHLV 


750 


404520 


RTA00002687F.f.05.2.P.Seq 


F 


M00040202A:F05 


CH14EDT 


751 


451789 


RTA00002692F.b.04.2.P.Seq 


F 


M00042956C:B06 


CHI SCON 


752 


455178 


RTA00002694F.b. 19. 1 .P.Seq 


F 


M000^3447A:C07 


CH20COHLV 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


753 


455136 


RTA00002694F.x08. 1 .P.Seq 


F 


M00042595A.B01 


CH20COHL V 


754 


379001 


RTA00002683F.O.02.2. P.Seq 


F 


M00040097A:C12 


CH09LNL 


755 


374763 


RTA00002673F.p.2U. P.Seq 


F 


M000391 18B:C05 


CH09LNL 


756 


402508 


RTA00002686F.0. 15.1. P.Seq 


F 


M0004028ID:BOI 


CHI3EDT 


757 


431370 


RTA00002669F.m.04.3. P.Seq 


F 


M00033233B:DI2 


CH08LNH 


758 


380500 


RTA00002670F.p. 19.1. P.Seq 


F 


M00033583B:E06 


CH09LNL 


759 


376743 


RTA00002678F.e.22.2. P.Seq 


F 


M00039461A:F04 


CH09LNL 


760 


191690 


RTA00002673F.m. 19.1. P.Seq 


F 


M00039I07CE04 


CH09LNL 


761 


374264 


RTA0000267 1 F.p.2 1 .2. P.Seq 


F 


M00033620B:E09 


CH09LNL 


762 


373020 


RTA0000267lF.b.20.2. P.Seq 


F 


M00033595A:C1 1 


CH09LNL 


763 


375231 


RTA0000267I F.m.20.2. P.Seq 




M00038387B:A07 


CH09LNL 


764 


16180 


RTA00002709F.J. 1 7. 1 .P.Seq 


F 


M00006977D:A03 


CH02COH 


765 


379403 


RTA000026 83 F.c. 1 7.2. P.Seq 


F 


M0004004IC:C09 


CH09LNL 


766 


375382 


RTA00002677F.d.24.2.P.Seq 


F 


M00039381D:C02 


CH09LNL 


767 


379653 


RTA000026 83 F.c.03 .2. P.Seq 


F 


M00040033D:G04 


CH09LNL 


768 


377858 


RTA00002681F.e.l4.2.P.Seq 


F 


M00039364A:A07 


CH09LNL 


769 


430861 


RTA0000266SF.h. 18.1. P.Seq 


F 


M00032995C:C05 


CH08LNH 


770 


37612S 


RTA00O02677F.a.l 1.2.P.Seq 


F 


MO0O39334B:EO3 


CH09LNL 


771 


375009 


RTA00002676F.n.20.2. P.Seq 


F 


M00039322A:F04 


CH09LNL 


772 


429816 


RTA00O02667F.n.22.1. P.Seq 


F 


M00032871 D:E 1 I 


CH08LNH 


773 


375657 


RTA00O0268IF.h. 13.2. P.Seq 


F 


M00039377C:C03 


CH09LNL 


774 


427839 


RTA00002666F.b.U.l.P.Seq 


F 


M00032530D:C02 


CHOSLNH 


775 


376761 


RTAO0O02677F.g.O3.2. P.Seq 


F 


M00O3939ID.FO8 


CH09LNL 


776 


44025 


RTA00002634F.b.24.2.P.Seq 


F 


MOC040I I5B:A04 


CH09LNL 


777 


44025 


RTA00002684F.c.01.2.P.Seq 


F 


M000401 15B.A04 


CH09LNL 


778 


392524 


RTA0000263 1 F.p.04.2.P.Seq 


F 


M00039909D:C02 


CH09LNL 


779 


427252 


RTA00002665F.b. 1 3. 1 .P.Seq 


F 


M00023135B:A06 


CHOSLNH 


7S0 


374927 


RTA00002673F.e. 12.1. P.Seq 


F 


M0003906SC:E06 


CH09LNL 


781 


373226 


RTA00002680F.g.09. 1 .P.Seq 


F 


M00039797C.G05 


CHO^LNL 


782 


2 1 7964 


RTA00002664F.g.03.2.P.Seq 




M00027299B:B12 


LHU4MAL 


783 


376363 


RTA00002677F.b. 14.2. P.Seq 


F 


M00039339A.H07 


CH09LNL 


784 


377719 


RTA00002677FJ.I 1.2. P.Seq 


F 


M00039407B:G02 


CH09LNL 


785 


378081 


RTA00002677F.e. 1 6.2.P.Seq 


F 


M00039334C:E02 


CH09LNL 


786 


89267 


RTA00002662F.b.0 1 .2. P.Seq 


F 


M00005445D:BOI 


CH02COH 


787 


374927 


RTA00002673F.e.l2.2.P.Seq 


F 


M00039068C:E06 


CH09LNL 


788 


279054 


RTA00002667F.b.23. 1 .P.Seq 


F 


M0003273IB:C10 


CHOSLNH 


789 


377283 


RTAO0O0Z682F.ni. 19.1. P.Seq 


F 


M00040016C:H12 


CH09LNL 


790 


45318 


RTA00002710F.I.05.1. P.Seq 


F 


M00022533A:AOS 


CH03MAH 


791 


1 88292 


RTA00002664F.e.23.2.P.Seq 


F 


M000271628:F05 


CH04MAL 


792 


378872 


RTA00O02683F.c.2O.2.P.Seq 


F 


M00040042B:AI0 


CH09LNL 


793 


427252 


RTA00002665F.b. 13.3. P.Seq 




M00028IS?B:A06 


CHOSLNH 


794 


330618 


RTA00002673Fj.l2.2.P.Seq 




M000390S4CG07 


CH09LNL 


795 


35646 


RTA00002667F.g. 16.1. P.Seq 




M00032797B:G02 


CHOSLNH 


796 


46407 


RTA00002665F.C. 10.3. P.Seq 




M00023196D:A03 


CHOSLNH 


797 


373720 


RTA00002674F.C.04. 1 .P.Seq 




M00039124C:F03 


CH09LNL 


798 


429693 


RTA0OOO:668F.f.O5.1. P.Seq 




M00032944B:302 


CHOSLNH 


799 


377108 


RTA0000:673F.p.04.2.P.Seq 




M00039636C:D11 


CH09LNL 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


800 


375657 


RTA0000268iF.h. 13. l.P.Seq 


F 


M00039877CC03 


CH09LNL 


801 


374868 


RTA00002673F.d.08.2.P.Seq 


■ F 


M00039063B:D08 


CH09LNL 


802 


428716 


RTA00002667F.e.08. l.P.Seq 


F 


M00032766B:D12 


CH08LNH 


803 


44025 


RTA00002684F.C.0I. l.P.Seq 


p 


M000401 15B:A04 


CH09LNL 


804 


430327 


RTA00002668F.k.l 1.1. P.Seq 


F 


M00033033C:H01 


CH08LNH 


805 


374328 


RTA00002673F.C.24. l.P.Seq 


F 


M0003906IB:F08 


CH09LNL 


806 


376946 


RTA00002682F.n. 10. l.P.Seq 


F 


M000400I9A:E01 


CH09LNL 


807 


375522 


RTA00002677F.n.08.2.P.Seq 


F 


M00039420D:D03 


CH09LNL 


808. 


395617 


RTA00002687F.b.l5.2.P.Seq 


F 


M00039767B:A04 


CH14EDT 


809 


21686 


RTA00002712F.g.05. l.P.Seq 


F 


M00026865B:A06 


CH04MAL 


810 


452038 


RTA00002692F.a.09.2.P.Seq 


; F 


M00042623D:D07 


CH18CON 


811 


25632 


RTA0000271 lF.g. 16.1 .P.Seq 


F 


M00023042D:D02 


CH03MAH 


812 


152487 


RTA000O2663F.e. 12. l.P.Seq 


F 


M00022181C:D0I 


CH03MAH 


813 


378226 


RTA00002680F.g.09.2. P.Seq 


F 


M00039797C:G05 


CH09LNL 


814 


402446 


RTA00002686F.C.04. 1 :P.Seq 


F 


M00040I33B:B03 


CHI3EDT 


815 


403642 


RTA00002687F.c.24.2.P.Seq 


F 


M00039945C:F09 


CH14EDT 


316 


186359 


RTA00002713F.g.24. l.P.Seq 


F 


M00027379C:B07 


CH04MAL 


817 


404290 


RTA00002688F.e.04.2.P.Seq 


F 


M00040395B:DM 


CHI4EDT 


818 


375443 


RTA00002676F.g.l9.2.P.Seq 


F 


M00039298B:D03 


CH09LNL 


819 


380279 


RTA00002673F.i.24. l.P.Seq 


F 


M00039082B:A05 


CH09LNL 


820 


3861 !0 


RTA00002687F.e.06. l.P.Seq 


F 


M00039955C:C04 


CHI4EDT 


821 


380279 


RTA00002673F.j.01. l.P.Seq 


F 


M00039082B:A05 


CH09LNL 


822 


386986 


RTA00002675F.p.06. l.P.Seq 


F 


M00039266A:302 


CH09LNL 


823 


186359 


RTA00002713F.h.0 I.I. P.Seq 


F 


M00027379C.B07 


CH04MAL 


824 


375611 


RTA00002677F.o.20.2.P.Seq 


F 


M00039425D:E12 


CH09LNL 


825 


378285 


RTA00002679F.h. 01. l.P.Seq 


F 


M0003968IB:H09. 


CH09LNL 


826 


44025 


RTA00002684F.b.24. l.P.Seq 


F 


M00040I 15B:A04 


CH09LNL 


827 


25240 


RTA0000271 1 F.c.l 2. l .P.Seq 


F 


M00022854A:303 


CH03MAH 


828 


403700 


RTA00002687F.g.03.2. P.Seq 


F 


M00040207B:D08 


CH14EDT 


829 


404679 


RTA000026S7F.f.07. l.P.Seq 


F 


M00040203A:H06 


CH14EDT 


830 


454806 


RTA00002693F.b.l2.2.P.Seq 


F 


M00043093C:G 1 1 


CH19COP 


831 


376829 


RTA00002674F.f.2 1 .2.P.Seq 


F 


M0O039135D:G02 


CH09LNL 


832 


456309 


RTA00002694F.d. 1 6. 1 .P.Seq 


F 


M000435I8B:D06 


CH20COHLV 


833 


374510 


RTA00002672F.i.I7.2.P.Seq 


'F 


M00039015D:H04 


CH09LNL 


834 


377232 


RTA00002685F.m.08.2. P.Seq 




M00040090B:G09 


CH09LNL 


835 


375779 


RTA00002672F.J. 20.2. P.Seq 


F 


M00039025A:H09 


CH09LNL 


836 


90746 


RTA00002671F.a.07.2.P.Seq 


F 


M00033585D:A02 


CH09LNL 


837 


453002 


RTA00002692F.b.2 1 .2. P.Seq 


F 


M00042970C:H10 


CHI SCON 


838 


402863 


RTA00002686F.n. 12. l.P.Seq 


F 


M00040273B.HI2 


CHI3EDT 


839 


402526 


RTA0O0O2686F.p.O7. l.P.Seq 




M000402S6C:C02 


CH13EDT 


840 


412778 


RTA00002685F.L07. 1 .P.Seq 


F 


M00039533D:F04 


CHI2EDT 


841 


402273 


RTA00002686F.J. 18. l.P.Seq 


F 


M00040233CG05 


CHI3EDT 


842 


374744 


RTA00002670F.1. 16. l.P.Seq 


F 


M00033427D:F0I 


CHO^LNL 


843 


375764 


RTA00002677F.0. 1 S.2.P.Seq 


F 


M00039425C.G0! 


CH09LNL 


844 


428218 


RTA00002667F.C.0 I.I. P.Seq 


F 


M0003273lC:C07 


CHOSLNH 


845 


374809 


RTA00002675F.h.0l. l.P.Seq 


F 


M00039230D:D09 


CH09LNL 


846 


20162 


RTA000027 1 OF. n.20. l.P.Seq 


F 


M00022662D:G1 1 


CH03MAH 



I* 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE (D 


LIBR.ARY 


847 


375782 


RTAOOU02677F.d.23.2.P.Seq 


F 


M0003938;C:H08 


CH09LNL 


848 


372958 


RTA00002672F.C.02. 1 .P.Seq 


F 


M00038639D:F07 


CH09LNL 


849 


403940 


RTA00002688F.d.07.2.P.Seq 


F 


M0004038*D:H05 


CH14EDT 


850 


8490 


RTA000027I IF.g.03.1. P.Seq 


F 


M00023020C:G08 


CH03MAH 


851 


374809 


RTA00002675F.g.24.1. P.Seq 


F 


M00039230D:D09 


CH09LNL 


852 


377788 


RTA00002684F.g.24.2.p.Seq 


F 


M00040305C:H06 


CH09LNL 


853 


13847 


RTA0000271 1 F.f.09. 1. P.Seq 


F 


M00022976OF04 


CH03MAH 


854 


374172 


RTA00002673 F.k. 1 6. 1 .P.Seq 


F 


■ M00039097D:D06 


CH09LNL 


855 


380314 


RTA00002682F.I.07.1. P.Seq 


F 


M00040009D:B07 


CH09LNL 


856 


47231 


RTA00002714F.b. 15.1. P.Seq 


F 


M000278I3CFOI 


CH04MAL 


857 


400287 


RTA00002685F.k. 10.1. P.Seq 


F 


M0003958~C:C0l 


CH12EDT 


858 


400533 


RTA00002685F.a.02.2.P.Seq 


F 


M0003913:D:E05 


CH12EDT 


859 


447594 


RTA00002689F.C.07.1. P.Seq. 


F 


M000426963:E05 


CH15CON 


860 


147357 


RTA00002711F.e. 15.1. P.Seq 


F 


M00022923B:C01 


CH03MAH 


861 


401 141 


RTA00002685F.o.22:2.P.Seq 


F 


M00039642D:BI2 


CH12EDT 


862 


404620 


RTA0OO02687F.C.03. 1 .P.Seq 


F 


M00O3977OA:Gl 1 


CH14EDT 


863 


24360 


RTA00002709F.1.20.I. P.Seq 


F 


M00007149A:G02 


CH02COH 


864 


380618 


RTA00O02673F.J. 12.1. P.Seq 


F 


M0003908-C:G07 


CH09LNL 


865 


448446 


RTA00002690F.d.09.3. P.Seq 


F 


M00042797D:DI0 


CH16COP 


866 


402313 


RTA00002686F.f. 13.1. P.Seq 


F 


M00040I74D:G03 


CH13EDT 


867 


273151 


RTA00002685F.c.05.2.P.Seq 


F 


M00039374C:H02 


CH12EDT 


868 


404 1 72 


RTA00002687F.d. 1 7.2.P.Seq 


F 


M0003995 IB:B12 


CH14EDT 


869 


263630 


RTA000026Q4F.e. 1 0. 1 .P.Seq 


F 


M00043637CH0I 


CH20COHLV 


870 


404277 


RTA00002687F.d. 18.1. P.Seq 


F 


M0003995iB:C03 


CH14EDT 


871 


403557 


RTA00002687F.d. 10.1. P.Seq 


F 


M00039948A:E03 


CH14EDT 


872 


375161 


RTA00002676F.m.24.2. P.Seq 


F 


M000393I9B:HI2 


CH09LNL 


873 


376829 


RTA00002674F. f.2 1 . 1 .P.Seq 


F 


M000391 35 D:G02 


CH09LNL 


874 


372953 


RTA00002672F.C.02.2. P.Seq 


F 


M00038639D:F07 


CH09LNL 


- 875 


1 1 *7R 


RTAOOO0" t 709F.a.24. 1 .P.Sea 


F 


M00005351C;G05 


CH02COH 


876 


402506 


RTA00002686F.b. 17.1. P.Seq 


F 


M0003976OB:B03 


CHULDI 


877 


141731 


RTA000027 1 3F.b.04. 1 .P.Seq 


F 


M000272i:D:E03 


CH04MAL 


878 


3741 1 


RTA0000266lF.e.l 1.1. P.Seq 


F 


M00003770.A:E05 


CHOICOH 


879 


372537 


RTA00002670F.c.05.2.P.Seq 


F 


M0003334:D:A09 


CH09LNL 


880 


380834 


RTA00002670F.C.03.2. P.Seq 


F 


M00033346C:A05 


CH09LNL. 


881 


401492 


RTA00002685F.n.l7.2.P.Seq 


— — — _ 


M00039609D:F07 


CH12EDT 


882 


99993 


RTA00002662F.b.23.2.P.Seq 


F 


M000067i:C:H09 


CH02COH 


883 


4043 1 1 


RTA0000268SF.d.2L2.P.Seq 


F 


M00040394A:D04 


CH14EDT 


884 


231084 


RTA00002664F.C.1 8.2. P.Seq 


F 


M0002691SB:D0l 


CH04MAL 


885 


447679 


RTA00002689F.b.l 1.3. P.Seq 


F 


M00042560A:F12 


CHI SCON 


886 


377012 


RTA000026S2F.d. 17.1. P.Seq 


F 


M00039936C:C05 


CH09LNL 


887 


226207 


RTA00002664F.d.2 1 .2.P.Seq 




M00027035 D:C06 


CH04MAL 


888 


446183 


RTA000026S9F.3. 12. 1. P.Seq 


F 


M00042534A:A05 


CH15CON 


889 


42850S 


RTA00002666F.C.24.I. P.Seq 


F 


M00O32545 B:H09 


CHOSLNH 


890 


15764S 


RTA000027 1 4F.b.20. 1 .P.Seq 


F 


M00027S18C:C07 


CH04MAL 


891 


404609 


RTA000026SSF.b. 15.2. P.Seq 


F 


M00040377C:G07 


CH14EDT 


892 


400464 


RTA000026S5F.1. 10.1. P.Seq 


F 


M00039590D:D02 


CH12EDT 


893 


379108 


RTA000026S5F.I.I2.1.P.Seq 


F 


M0003Q59iC:D06 


CH12EDT 




WO 01/02568 



PCT/US00/18374 



SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


894 


374639 


RTA00002676F.d.2I.2.P.Seq 


F 


M0003923-iD:B12 


CH09LNL 


895 


380674 


RTA00002673F.j.U.2.P.Seq 


F 


M00039084C:H04 


CH09LNL 


896 


380674 


RTA00002673F.J. 14.1. P.Seq 


F 


M0003908-iC:H04 


CH09LNL 


897 


188972 


RT A 00002 664 F . d . 20 . 2 . P. Seq 


F 


M00027030C:H06 


CH04MAL 


898 


402835 


RTA00002686F.cOM. P.Seq 


F 


M00040I3ID:G08 


CH13EDT 


899 


403774 


RTAO0002687F.d.08.2.P.Seq 


F 


M00039947C:G03 


CH14EDT 


900 


374606 


RTA00002673F.j.23.2.P.Seq 


F 


M00039096A:A05 


CH09LNL 


901 


192535 


RTA00002663F.m. 14.1. P.Seq 


F 


M00022925CA08 


CH03MAH 


902 


377926 


RTA00002680F.U6.2.P.Seq 


F 


M00039820B:B06 


CH09LNL 


903 


186055 


RTA00002712F.U 1.1. P.Seq 


F 


M00026926A:E10 


CH04MAL 


904 


380498 


RTA00002684F.f.l l.lP.Seq 


F 


M00040129D:E10 


CH09LNL 


905 


400236 


RTA00002685F.L 1 8.2.P.Seq 


F 


M00059561A:B07 


CH12EDT 


906 


401070" 


RTA00002688F.d.l2.2.P.Seq. 


F 


M00040390A:H02 


• CH14EDT 


907 


452622 


RTA00002692F.b.l4.2.P.Seq 


F 


M00042962D:C05 


CHI SCON 


908 


235052 


RTA00002692F.a. 15.1 .P.Seq 


F 


M00042626B:D08 


CHI SCON 


909 


452221 


RTA00002692F.c.l3.2.P.Seq 


F 


M00042936C:G12 


CHI SCON 


910 


404581 


RTA00002687F.S.1 1.2.P.Seq 


F 


M00040208D:G09 


CHI4EDT 


911 


376925 


RTA00002687F.e.l4.2.P.Seq 


F 


M00059957C:C09 


CH14EDT 


912 


400287 


RTA00002685F.k.l0.2.P.Seq 


F 


M00039584C:C0I 


CH12EDT 


913 


403242 


RTA00002687F.1.05.2.P.Seq 


F 


M00040523B:C12 


CH14EDT 


914 


453313 


RTA00002693F.a.07.2.P.Sea 

• 


F 


M000426UB:B05 


CH19COP 


915 


452633 


RTA00002692F.t'.l l.2.P.Seq 


F 


M00043067D:D10 


CHI SCON 


916 


447679 


RTA00002689F. b. 1 M .P.Seq 


F 


M00042560A:F12 


CH15CON 


917 


452398 


RTA00002692F.f. 1 7. 1 .P.Seq 


F 


M00043125C:A1 1 


CHI SCON 


918 


449797 


RTA00002691F.b.22.3.P.Seq 


F 


M00043334B:AI0 


CH17COHLV 


919 


403916 


RTA00002687FJ.I L2.P.Seq 


F 


M000403I4D:H05 


CH14EDT 


920 


236906 


RTA00002693F.d.05.2.P.Seq 


F 


M00043I54A.B07 


CH19COP 


921 


404161 


RTA00002687F.e,20.2.P.Seq 


F 


M00039958CB09 


CH14EDT 


922 


3861 10 


RTA00002637F.e.06.2.P.Seq 


F 


M00059955C:C04 


CH14EDT 


923 


451512 


RTA00002691F.b.02.3.P.Seq 


F 


M00043305B:G02 


CHI7COHLV 


924 


4005 1 7 


RTA00002687F.k.l5.2.P.Seq 


F 


M00040320D:F02 


CH14EDT 


925 


403578 


RTA00002687F.i.0L2.P.Seq 


F 


M00040296D:E09 


CH14EDT 


926 


403578 


RTA00002687F.h.24.2.P.Seq 


F 


M00040296D:E09 


CHUEDT 


927 


403371 


RTA00002687F.h. 1 9.2.P.Seq 


F 


M00040294D:D12 


CHI4EDT 


928 


452531 


RTA00002692F.f. 1 6. 1 .P.Seq 


F 


M00045125A;BI 1 


CHI SCON ' 


929 


454453 


RTA00002693F.f.I5.2.P.Seq 


F 


M00043215A:D02 


CHI9COP 


930 


238270 


RTA00002692F.e.07.2. P.Seq 


F 


M00043023A:G05 


CHI SCON 


931 


14583 


RTA00002687F.f.08.2.P.Seq 


F 


iV!00040203B:A05 


CHUEDT 


932 


400464 


RTA00002685F.l.l0.2.P.Seq 


F 


M00039590D:D02 


CH12EDT 


933 


404642 


RTA00002687F.f.02.2.P.Seq 


F 


M00040201C:GI 1 


CHUEDT 


934 


380413 


RTA00002680F.k. 19. L P.Seq 


F 


M00039S16CD05 


CH09LNL 


935 


237963 


RTA00002693F.c.20.2.P.Seq 


F 


M00043I48CA09 


CH19COP 


936 


20847 


RTA000027l0F.d.09.l. P.Seq 


F 


M000:i852D:A05 


CH03MAH 


937 


456531 


RTA00002694F.b. 18.1. P.Seq 


F 


M00043446C:EI2 


CH20COHLV 


938 


4:0463 


RTA00002694F.3. 12.1. P.Seq 


F 


M000425%C:D07 


CH20COHLV 


939 


456713 


RTA00002694F.d. 13.1. P.Seq 


F 


M0004J?I3D:G08 


CH20COHLV 


940 


455508 


RTA00002694F.3. 15.1. P.Seq 


F 


M000425^B:E!2 


CH20COHLV 



i 
i 



WO 01/02568 



PCTAJS00/18374 



SEQ 








1 

. 

j CLONE ID 




1 ID 


CLUSTER 


| SEQ NAME 


ORIENTATIO 


LIBRARY 

l 1 ****** i\ i 


941 


376133 


RTA00002674F.m.05.2.P.Sec 


F * 


M00039I69A:£12 




j 942 


402331 


RTA000026S6F.rn.03.LP.Scq 


F 


M00040264DG05 


1 CHI3FDT 


945 


373820 


RTA00002674F.d.06.2.P.Seq 


F 


M00039127A:G1 1 


1 CH09LNI 


I 944 
1 943 


35383 
' 400732 


RTA00002674F.C.06.2. P.Seq 
RTAQ0002635F.k.24.2.P.Seq 


F 

F 


M00039124CH08 
M0003958~CF12 


CH09LNI 

1 1 IV7Li" la, 

1 CH PFDT 


946 


431629 


RTA00002669F.LI4. LP.Seq 


F 


M000332763:G08 


CH03LNH 


947 


449349 


RTA00002690F.d.l2J.P.Seq 


F 


M00042802C:C04 


CH16COP 


| 943 


401 124 


RTA00002685F.O.1 LlP.Seq 


F 


• tVI00039629D:B04 


CHI2EDT 


1 949 


453233 


RTA00002693F.a.OI.2.P.Seq 


F 


M0004261 1A:A06 


CH19COP 


950 


124813 


RTA00002685F j. l0.2.P.Seq 


F 


M00039564BC01 


CHPEDT 


951 


454627 


RTA00002693F.f.09.2.P.Seq 


F 


M000432IOC:E05 


j CH19COP 


952 


169464 


RTA00002663F.L19. LP.Seq 


F 


MOOO^O" A E09 


f CH03MAH 


953 


451654 


1 RTA00002692F.f.02.2.P.Seq 


F 


MOOO43O44D:A09 


CHhSCON ; 


954 


406092 


RTA00002685F.k.l 1.2.P.Seq 


F 


M000395S4C CI 1 


CHPEDT 1 

1 4 ft 1 ft^ *- 1 


955 


453501 


RTA00002695 F.d. 1 4.2. P.Seq 


F 


M00043 I6"*D CP 


CHI9COP 


956 


450845 


1 RTA0000269IF.f. 10. 1. P.Seq 


F 


M000434 IOC A09 


CH I7COHLV 


957 


443177 


1 RTA00002690F.e.l 2. LP.Seq 


F 


M00042839BBI 1 


CH16COP 


958 


402617 


RTA00002686F.b.2 LI. P.Seq 


F 


M0004013 I 3 Dl 1 


CHPEDT 


959 


378014 


RTA000026SOF* 17. LP.Seq 


F 


M000"9799 A DIO 


CH09t N*L 


960 


124313 


1 RTA00002685F.J. 10. LP.Seq 


F 


M00039^6-iB CO I 


1 CHPFDT 1 


[961 


29450 


1 RTA00002663F.d.07.1. P.Seq 


F 


M000" > * > 0 < 4 A H03 


CH03MAH 


962 


400436 


RTAO0O026S5F.e.O2. 1. P.Seq 


F 




1 pMnpnT 


963 


44753 


RTA0000271 3FT.05. LP.Seq 


F 


M00027324DC05 


CH04MAL 


964 


443177 


RTA00002690F.e. 12,2. P.Seq 


F 


M0004 - *8"9B-BI 1 


CHI6COP 


965 


447697 


RTA00002639F.e.l5.3.P.Seq 


F 


M0004290iA:FI 1 


f CH 1 <CON 


966 


240318 


I RTA00002637F.d.04. LP.Seq 


F 


M00039947A:D06 


CHI4EDT 


967 


451620 


RTA00002691F.d.20.3.P.Seq 


F 


M00043379D:H02 


CHI7COHLV 


968 


400157 


RTA00002685F.i.20.2. P.Seq 


F 


M00059^6I 3 A09 


1 CHPEDT 


969 


40fT7* 


RTAO0O02685F.h.l6.2.P.Sea 


F 


M00039528B:BP 


CHPEDT 


970 


449779 


1 RTA0000269!F.d.04.3.P.Seq 


F 


M00045367B:A08 


CH 1 7COHLV 


971 


400157 


1 RTA00002685F.L20. LP.Seq 


F 


M0005956IB:A09 j 


CHPEDT 


972 


238133 


RTA00002685F.e.03.2.P.Seq 


F 


M00039496B H09 | 


CHPEDT 


973 


452015 


RTA00002692F.c.07.2.P.Seq 


F 


M0004293iB:DI 1 


CHI SCON J 


974 


400732 


RTA00002685F.1.0 1.2. P.Seq 


F 


M000395S7C:F12 | 


CHPEDT 


975 


24934 


1 RTA000027I IF.d.2 I.I. P.Seq 


F 


M000229IOA:A06 | 


CH03MAH | 


976 


449040 


RTA00002690F.e. 14.2. P.Seq 


F 


M0004234ID:H07 | 


CHI6COP 1 


977 


377431 


RTAO0O0267IF.i.l5.3.P.Seq 


F 


M00038303A:C03 | 


CH09LNL | 


978 


400910 


RTA00002685F.b.07. LP.Seq 


F 


M00039367B:H02 | 


CHPEDT 1 


979 


376945 


RTA00OO2682F.k.23. LP.Seq 


F 


M00040007D:A06 


CH0OLNL | 


980 


15906 


RTA00002709F.e. 14. LP.Seq 


F 


M00005S05D:DI2 


CH02COH 


981 


452731 


RTA00002692F.b.l6.2.P.Seq 


F 


M000429663 F07 


CHI SCON 


982 


415294 


RTA000026S6FT. 14. LP.Seq 


F 


M00040173D:305 


CHPEDT 


983 


40 1 644 


RTA000026S5F.n. 16, LP.Seq 


F 


M00059603D:H0I 


CHPEDT 


984 


404402 


RTA000026S7F.3.I9.2. P.Seq 


F 


M00039761D:E10 


CHUEDT 


985 


40 1 709 


RTA00002685F.n.24.2.P.Seq 


F 


M00039624A:H09 


CHPEDT 


986 


401644 


! RTA00002685F.n.l6.2.P.Seq 


F 


M0003960SD:HOI 


CHPEDT 


937 


452531 ' 


RTA00002692F.f. 16.2. P.Seq 


F 


M00043125ABI 1 


CHI SCON 1 



159 



WO 01/02568 



PCT/US00/18374 



938 



989 



996 



003 
1004 



1005 
1006 



1007 



1009 



1010 



1011 
1012 



1014 



10jj__ 449825 | RTA0000269IF.b.l4.3.P.Seq 



017 377632 I RTA00002683F.1. 18.2. P.Seq 



10 i S 



1020 



1027 377 1 36 I RTA00002682F.m.Q7. 1 .P.Seq 



1023 404679 I RTA00002687F.f.07.2.P.Seq 



1030 



1032 



CLUSTER 



SEQ NAME 



400910 I RTA0Q0Q2635F.b.07.2.P.Seq 



449235 RTAQ000Z69QF.a.22.3.P.Seq 



449794 I RTA0000269IF.c.22.2.P.Seq 
400921 1 RTA0Q002685F.b. 18.1. P.Seq 



373874 
4Q1Q50 
453237 



RTA0Q002672F.c.22.2.P,Seq 
RTA00002635F.e.09,2.P.Seq 
RTA 00002693 F.c.02.2.P.Seq 



995 449294 | RTA00002690F.C. 13.3.P.Seq 



404260 



378014 
404726 
451347 



RTA0Q002680F.g. 1 7.2, P.Seq 
RTA00Q02688F.a. 1 8.2.P.Seq 
RTA0000269IF.b.l U.P.Seq 



401154 I RTA00002683F.~e.06.2.P.Seq 

401370 RTA00002686F.b.22.l.p[Seq 

400170 RTA00002685F.b.03.2.P.Seq 

25387 RTA0000271 lF.f.19. l.P.Seq 

377035 1 RTA00002678F.n. 14. l.P.Seq 



403530 
372930 



RTA00002688F.a.09.2.P.Seq 
RTA000Q2670F.j. 12.2. P.Seq 



401120 | RTA0Q002635F.C.23 .2. P.Seq 



RTA00002687F.h.02.2.P.Seq 



RTA00002690F.c.l3.3.P.Seq 



403561 I RTA00002638F.d.06.2.P.Seq 



134182 
377035 



RTA00002692F.d.l3.2.P.Seq 
RTA00002673F.n.l4.2.P.Seq 



376 1 38 | RTA0000Z674F.m.05. 1 .P. Seq 



401 154 I RTA00002685F.e.06. 1 .P.Seq 



403896 [ RTA00002637F.a.04.2.P.Seq 



1019 450045 



450345 [ RTA0000269IF.f.l0.2.P.Seq 



RTA0000269IF.e.l0.2.P.Seq , 
RTA00002686F.d.22.1.P.Seqt 
427674 "f RTA00002665F.i. 10. l.P.Seq 



402962 



jOgj 403252 | RTA00002638F.c.l5.2.P.Seq 



1023 452033 | RTA00002692F.a.09. l.P.Seq 



401553 I RTA00002685F.d.08.2.P.Seq 
451092 RTA00002691F.d. 17.3, P.Seq 
403978 I RTA00002637F.g.09.2.RSeq 



1029 373875 I RTA000O:674F.c.05. l.P.Seq 



123341 I RTA0000:633F.o l 5.2. P.Seq 



'031 33971 | RTA0000:713F.h.l3.I.P.Seq 



-2878 | RTA0000:666F.h. 13. t .P.Seq 



400731 RTA0000:685F.i.03.2. P.Seq 



1034 | 456456 [ RTA00002694F.b.22. l.P.Seq 



^0 



ORIENTATION CLONE ID 



LIBRARY 



M00Q39367B:H02 | CHI2EDT 



M00042439B;B03 CH16COP 



M0004336IB:A01 
M00039371B:H06 



M00038663D:H10 
M00039499C:A04 
M00043I08A:F06 



M00042770C:C04 CH16COP 



M00039942D:C01 CH14EDT 



M00039799A:D10 
M0004Q37!C:H05 
M000433I iC:E03 



M00039497C:C06 
M00040I31C:F03 
M00039366C:B07 
M0OO230OICC08 



CH17COHLV 
CH12EDT 



CH09LNL 
CH12EDT 
CH19COP 



CH09LNL 



CHI4EDT 
CHI7COHLV 



M00039619B:D02 . CH09LNL 



M00040368 A: FO 1 CH14EDT 



M0003343~C:A07 CH09LNL 



M0003 9 3 7 9 A : B03 C H 1 2 E DT 



M000402I9B:D02 CH14EDT 



M00042774C:C05 CH16COP 



M000405 8 "C : E07 CH I4EDT 



M0004301 1A:H!2 CHI SCON 



M000396 1 9B:D02 CH09LNL 



iV100039J69A:E12 CH09LNL 



M0003949 - C:C06 CH12EDT 



M00043320B:A07 CH 1 7COHL V 



M00039746C:H05 CH14EDT 



M000400S"D:F08 CH09LNL 



M000434 I0C.A09 CH 1 7COHLV 



CH12EDT 
CH 13EDT 
CH12EDT 
CH03MAH 



M0004339 i A:C 10 CH I 7COHL V 



M00040147D:H11 
M00028775D:F03 



M00040333D:C04 CH14EDT 



M00042623 D:D07 CH 1 SCON 



M0003948Z3:G02 CHI2EDT 



M00Q4337~A:C03 
M0004020SB:A07 



M000400UD:F03 CH09LNL 



CH13EDT 
CH0SLNH 



CH17COHLV 
CHI4EDT 



M 000402 0 3 A : H 06 CH14EDT 



M00039 1 24C:H02 CH09LNL 



M00039630C:H04 CHI2EDF 



M00027392B:H02 



M000325O"C:B0l 



M000395623:G02 



M0004344Oa:EI2 



CH04MAL 



CH0SLNH 



CH12EDT 



CH20COKLV 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 
1035 



1036 



1037 



1038 



1039 



1040 



1041 



1042 



104; 



1044 



1045 



1046 



1047 



1048 



1049 



1050 



1051 



1052 



105; 



10:4 



1055 



1056 



1057 



1058 



1059 



1060 



1061 



1062 



1064 



1065 



1066 



1067 



1068 



1069 



1070 
1071 



1072 



1073 



1074 



1075 



1076 



1077 



1078 



1079 



1080 



1081 



CLUSTER 
402337 



401974 



455141 



402057 



402555 



406092 



374351 



402365 



401828 



447669 



402588 



244858 



402339 



40 1 766 



402952 



449669 



400520 



403868 



403242 



402 1 82 



449269 



40 1 290 



448420 



374351 



448464 



401079 



403916 



401374 



219825 



377732 



380348 



449549 



402223 



401727 



179878 



378602 



448065 



403493 



4005 1 7 



456636 



400101 



403578 



4024 1 9 



7516 



401851 



400567 



SEQ NAME 
RTA00002686F.L07.1 .P.Seq 



RTA00002686F.i.l5.1.P.Seq 



RTA00002694Fb.14.LP.Seq 



RTA000026S6F.L14.LP.Seq 



RTA00002686F.m. 14. 1 .P.Seq 



RTA000026S5F.k. II. LP.Seq 



RTA00002674F.i.20. 1 .P.Seq 



RTA00002686F.j.08. 1 .P.Seq 



RTA00002686F.J. 14. LP.Seq 



RTA00002689F.a. 1 5.2.P.Seq 



RTA00002686F.k. 18. LP.Seq 



RTA000026S6F.I.02. LP.Seq 



RTA000026S6F.i.20. LP.Seq 



RTA00002686F.0. 1 6. 1 .P.Seq 



RTA00002686F.g. 14. LP.Seq 



ORIENTATION! 
F 



CLONE ID 
M00040257D:H10 



M00040223A:C05 



IF 
F 



M00043440C:B07 



M00040260C:D04 



M00040267C:C04 



M000395S4C:C1 1 



M00039147A:F10 



M00040230A:H02 



M00040232D:B07 



M000425J _B:E06 



M00040254B:CI0 



M00040256A.A06 



M00040226A:HI0 



M00040282A:A0; 



M00040181D:HIO 



RTA00002690F.C. 10.3. P.Seq 



RTA00002685F.g.04.2. P.Seq 



RTA00002687F.k.05. LP.Seq 



RTA00002687F.1.05. 1 .P.Seq 



RTA00002686F.f. 16. LP.Seq 



RTA00002690F.C. 12.3. P.Seq 



RTA00002685F.n. 10. LP.Seq 



RTA00002690F.d.07.J. P.Seq 



RTA0000:674F.i.20.:.P.Seq 



RTA00002690F.c.08._VP.Seq 



RTA00C02685F.p.05.2. P.Seq 



RTA00002687FJ. I L LP.Seq 



M00042767B:G10 



M000395I2CD06 



M00040318CHI 



M00040323B:C12 



M00040174C:EIO 



M00042770B:BI2 



M000396063:D08 



M0004279OC:C07 



M00039147A.F10 



M00042765C:D04 



M00039643C:B04 



M00040314D:H05 



RTA0000:685F.p.07.:.P.Seq 



M00039645C.EOI 



CH15EDT 



CH20COHLV 



CH13EDT 



RTA00002664F.h.06.:.P.Seq 



RTA0000268 1 F.p.09.2.P.Seq 



RTA00C02684F.d. 12. LP.Seq 



RTA00002690F.a.09.3.P.Seq 



RTA00002636F.f.05. LP.Seq 



RTA000026S5F.o.23.2.P.Seq 



RTA00C02682F.h. 12. LP.Seq 



RTA00002681 F.a.08.2.P.Seq 



M000273%D:G08 



M00039910C:G10 



M000401213:C05 



M00042431C:F01 



M00040169B:F08 



M00039642D:H09 



M00059984A:C02 



M00039839C:E05 



RTA00002690 F. e .2 2 . 3 . P.Seq 



RTA00002637F j.03. 1 .P.Seq 



RTA00C02637F.k. 15. LP.Seq 



RTA00002694F.e.05 . 1 . P.Seq 



RTA00002635F.O.04. LP.Seq 



RTAOOCOI687F.i.01.l P.Seq 



RTA00CQ2636F.g.20. LP.Seq 



RTAOOC02676F.n.01 2.P.Seq 



RTA0P002636F.d.07, LP.Seq 



RTA00C0263:F.a. 14.2. P.Seq 



M00042781 A:A07 



M00040313D:E04 



M0004032OD:FO2 



M00043632D:F09 



M00039625B:G08 



M00040296D:£09 



M00040lS-iC:Al 1 



M000393IPB.HI2 



M00040143.A:H05 



M0003936I3:E01 



CH13EDT 



CH12EDT 



CH09LNL 



CHI3EDT 



CH13EDT 



CH15CON 



CH13EDT 



CH13EDT 



CH13EDT 



CH13EDT 



CH13EDT 



CH16COP 



CH12EDT 



CH14EDT 



CH14EDT 



CH13EDT 



CH16COP 



CH12EDT 



CH16COP 



CH09LNL 



CH16COP 



CH12EDT 



CH14EDT 



CHI2EDT 



HI2EDT 



CH04MAL 



CH09LNL 



CH09LNL 



CH16COP 



CH13EDT 



CH12EDT 



CHO^IM 



CHO^LNL 



CH16C0P 



CH14EDT 



CH14EDT 



CH20COHLV 



CHI2EDT 



CHI4EDT 



CH13EDT 



CH09LNL 



CH13ED 



CH12EDT 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 

- 


CLUSTER 


SEQ NAME 


ORIENTATE 


' CLONE ID 


LIBRARY 


1082 


376641 


RTA00002677F.d.O 1 .2. P.Seq 


F 


M000393*±5A;D09 


CH09LNL 


1083 


376641 


RTA00002677F.c.24.2.P.Seq 


1 F 


M0003934: A:D09 


CH09LNL 


1084 


400450 


RTA00002685F.J.22. 1 .P.Seq 


F 


M00O39570A:DI0 


CH12EDT 


1085 


375373 


RTA00002676F.fi* 1 2. l.P.Seq 


F 


MO0O393OCCC09 


CH09LNL 


1086 


375373 


RTA00002676F.h.l2.2.P.Seq 


F 


M00039300C:C09 


CH09LNL 


1087 


413643 


RTA00002685F.n.05.2.P.Seq 


F 


M0003960-^D:G03 


CH12EDT 


1083 


448874 


RTA00002690F.C.02.3. P.Seq 


F 


M00042759B:G1 1 


CHI6COP 


1089 


376511 


RTA00002674F.h.04. 1.P.Seq 


F 


MO003914OA.B08 


CH09LNL 


1090 


374040 


RTA00002674F.H.2 l.l.P.Seq 


F 


M0003914ZD:BI 1 


CH09LNL 


1091 


454132 


RTA00002693F.e.l8.1. P.Seq 


F 


M00043I9! A:A07 


CH19COP 


1092 


404581 


RTA00002687F.2. 1 l.l.P.Seq 


F 


M00040208D:G09 


CH14EDT 


1093 


260521 


RTA00002689F.C. 13.1. P.Seq 


F 


M0004270:3:G02 


CH15CON 


1094 


379564 


RTA00002687F.o.l2.1.P.Seq 


F 


M00040346A:C1 I 


CHI4EDT 


1095 


452491 


RTA00002692F.f.05.2.P.Seq 


F 


M00043046D:BI 1 


CHI SCON 


.1096 


403541 


RTA00002687F.p.20.2. P.Seq 


F 


M0004036-A:E05 


CH14EDT 


1097 


404636 


RTA00002688F.b.l l.2.P.Seq 


F 


M00040376CG02 


CH14EDT 


1098 


379564 


RTA00002687F.o.l2.2.P.Seq 


F 


M0O040346A:CI 1 


CHI4EDT 


1099 


45 1 548 


RTA00002691F.b.0Q.3. P.Seq 


F 


M0004331CC:G06 


CHI7COHLV 


1100 


454308 


RTA00002693F.f.U.LP.Seq 


F 


M00043213 3:B12 


CH19COP 


1 101 


401 184 


RTA00002685F.d.04.2.P.Seq 


F 


M0003938CC:C09 


CHI2EDT 


1102 


401290 


RTA00002685F.n. 1 O.Z.P.Seq 


F 


M000396063:DO8 


CHI2EDT 


1 103 


400101 


RTA00002685F.o.04.2.P.Seq 


F 


M00039625 3:G03 


CH12EDT 


1 104 


454308 


RTA00002693F.f.U.2.P.Seq 


: F 


M00043213B:BI2 


CH19COP 


1 105 


452622 


RTA00002692F.b. 1 4. 1.P.Scq 


F 


M00042962D:C05 


CHI SCON 


1 106 


450012 


RTA0000269IF.d.09.3. P.Seq 


F 


M000433703:C08 


CH17COHLV 


1107 


400503 


RTA00002685F.k.02.2.P.Seq 


F 


M000395703.-D10 


CHI2EDT 


1103 


400450 


RTA000O2685FJ.22.2.P.Seq 


F 


M00039570A:D10 


CHI2EDT 


1109 


446166 


RTA00002689F c.1T. I. P.5eq j 


F 


M000427I !3:A1 I 


CHI5CON 


1 1 10 


456233 


RTA000O2694F.e.0S. 1 .P.Seq 


F 


M000436J63:C06 


CH20COHLV 


11 11 


25443 


RTA00002710F.d. 15.1. P.Seq 


F 


M00021S6cO:A03 


CH03MAH 


1112 


4041 19 


RTA00002633F.d. 1 7.2.P.Seq 


F 


M00040392C:B12 


CHI4EDT 


11 13 


403642 


RTA00002687F.d.O l.l.P.Seq 


F 


M00039945C:F09 


CH14EDT 


1 1 14 


403493 


RTA00002687F.j.03.2.P.Seq 


F 


M00040313D:E04 


CHI4EDT 


1115 


454132 


RTA00002693F.e.l8.2.P.Seq 


' F 


M0004319iA:A07 


CH19COP 


1 1 16 


450607 


RTA0000269IF.d.l2.3.P.Seq 


- F 


M00043372C:G05 


CH17COHLV 


1117 


451718 


RTA00002692F.e.24.2.P.Seq 


F 


M0004304-3:A12 


CHI SCON 


1 i 13 


453907 


RTA00002693F.b.08.2.P.Seq 


F 


M0004308"3:G07 


CH19COP 


11 19 


447669 


RTA00002689F.a. 15. 3. P.Seq 


F 


M0004253S3:E06 


CH15CON 


1120 


404044 


RTA00002687F.p.l l.l.P.Seq 


F 


M0004035i D:AI 1 


CHI4EDT 


1121 


449617 


RTA00002690F.e.l6.2.P.Seq 


' F 


M00042S4?D:FI 1 


CH16COP 


1 122 


452723 


RTA00002692F.e.l8.2.P.Seq 


F 


M00043036C:E05 


CHI SCON 


1123 


270014 


RTA00002685F.i.l5.2.P.Seq 


F 


M00039536C:HI 1 


CH12EDT 


1124 


401 198 


RTA00002685F.i.l4.2.P.Seq 


F 


M0003<>536C:CIO 


CH12EDT 


1125 


452414 


RTA00002692F.e. 1 2.1. P.Seq 


F 


MOOOJ3032C:A10 


CHI SCON 


1 126 


453019 


RTA00002692F.d.lS.2.P.Seq 


F 


M000430I5A:HI0 


CH18CON 


1 127 


403642 


RTA00002687F.C.24. 1. P.Seq 


F 


M0003994;C:F09 


CHI4EDT 


1 128 


401437 


RTA00002685F.C. IS.:. P.Seq F 


M0003*)377D:EI2 


CH12EDT 



WO 01/02568 



PCTYUSOO/18374 



SEQ 
ID 



CLUSTER 



129 



452414 



130 



404122 



131 



400567 



132 



401437 



404642 



134 



376007 



135 



402835 



136 



403774 



137 



45505 



138 



452071 



139 



449832 



140 



379004 



141 



455211 



142 



37902 1 



143 



376279 



144 



374373 



145 



97668 



146 



400407 



147 



402904 



148 



403912 



149 



4005 1 1 



150 



402746 



151 



403849 



152 



401471 



153 



404362 



154 



373641 



155 



401952 



156 



400685 



380462 



159 



400078 



160 



373748 



16 



401392 



162 



20548 



SEQ NAME 



RTA00002692F.e. 12.2. P.Seq 



RTA00002637F.n. 10. LP.Seq 



RTA00002685F.a.l4.LP.Seq 



RTA00002685F.C. 1 8. 1 .P.Seq 



RTA00002687F.L02. LP.Seq 



RTA00002676F.f.22.2.P.Seq 



RTA00002686F.b.24. 1 .P.Seq 



RTA00002687F.d.08. 1 .P.Seq 



RTA 000027 1 2F.d.04. 1 . P.Seq 



RTA00002692F.c.05.2.P.Seq 



RTA0000269IF.e.l3.LP.Seq 



RTA00002633F.n.O9.2.P.Seq 



RTA00002694F.b.07.l. P.Seq 



RTA00002683F.n.l3.2.P.Seq 



RTA00002680F.d. 10.2.P.Seq 



RTA0000268IFn.21.LP.Seq 



RTA00002636F.d. 1 9. LP.Seq 



RTA000O2685F.a.05.:.P.Seq 



RTA00002686F.n.l5.I.P.Seq 



RTA00002687FJ. 19. LP.Seq 



RTAO00O2685F.b.23.2.P.Seq 



RTA00002686F.a. 14. LP.Seq 



RTA00002687F.n.09.2.P.Seq 



RTA00002685 F.o. 10. 1 .P.Seq 



RTA00002637F.O.06.2. P.Seq 



RTA00002677F.i.09.2.P.Seq 



RTA00002686FJ. 1 0. LP.Seq 



RTA00002685F.rn.09.2. P.Seq 




RTA00002670F.O.0 1 .2. P.Seq 



RTA00002685F.m. 1 5.2. P.Seq 



RTA0000267IF.I.06.3.P.Seq 



RTA00002635F.f.03.2. P.Seq 



163 



-76279 



164 



374428 



165 



374428 



166 



172914 



167 



378320 



163 



235422 



169 



40247: 



170 



374828 



171 



403912 



172 



401471 



17; 



404362 



174 



403849 



175 



395617 



RTA000O27l0F.h.l5. LP.Seq 



RTA00002680F.d. 10. LP.Seq 



RTA00002672F.a.20. LP.Seq 



RTA00002672F.a.20.2. P.Seq 



RTA000O2679FJ.2L LP.Seq 



RTA00002631F.L 14.2. P.Seq 



RTA00002665F.h. 19. LP.Seq 



RTA00002686F.p.l 1. LP.Seq 



RTA00002674F.m. 1 0. LP.Seq 



ORIENTATION 



RTA0OOO:637F.j. 19.2. P.Seq 



RTA00002685F.0. 10.2. P.Seq 



RTA00002687F.O.06. LP.Seq 



RTA 00002687F.n. 09. LP.Seq 



RTA0000:687F.b.l5.LP.Seq 



9% 



CLONE ID 



M00043032CA10 



M00040334D:B02 



M00039361B:E01 



M00039377D:EI2 



M00040201C:GI1 



M00039293B:CII 



M00040131D:G08 



M00039947C:G03 



M00023377B:F01 



M00042979B:E02 



M00043393A:B08 



M00040093B:C02 



M00043430B:C02 



M00040093D:DO: 



M00039735D-.G05 



LIBRARY 



CH18CON 



CH14EDT 



CH12EDT 



CH12EDT 



CH14EDT 



CH09LNL 



CH13EDT 



CHI4EDT 



CH04MAL 



CHI SCON 



CH17COHLV 



CH09LNL 



CH20COHLV 



CH09LNL 



CH09LNL 



M00039903A:H07 



M00040145D:DO: 



M000391S4A:D03 



M00040274A:H1 I 



M00040317A:H03 



M00059372C:DI2 



M00039740B:FlO 



CH09LNL 



CH13EDT 



CH12EDT 



CH13EDT 



CH14EDT 



CHI2EDT 



CH13EDT 



M00040333D:G05 



M00039629B:F01 



M00040342B:DI2 



M00039403A:GI2 



M0004023IB:C03 



M0003^ T D:F04 



M000 



j j 



0B:E06 



M00039600A:A1 1 



M00038325D:F12 



M00039505C:E03 



M0002224^A:E02 



M00039785D:G05 



M00038633B:G02 



M0OO3S633B:G02 



M00039696A:E05 



M00039394C:H07 



M0002S76SC:D05 



M000402S~C:B09 



M0003Ol70A:Bl0 



M000403TA:H03 



M0003962^B:F0i 



M00040342B:D12 



M00040333D:G05 



M0003976"B:A04 



CHI4EDT 



CH12EDT 



CH14EDT 



CH09LNL 



CHI3EDT 



CH12EDT 



CH13EDT 



CH09LNL 



CHI2EDT 



CH09LNL 



CH12EDT 



CH03MAH 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CH09LNL 



CHOSLNH 



CH13EDT 



CH09LNL 



CH 14EDT 



CH12EDT 



CH14EDT 



CH14EDT 



CHI4EDT 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


f CLONE ID 


LIBRARY 


1 176 


40 1 709 


RTA00002685 F.o.0 1 .2.P.5eq 


F 


M00039624A:H09 


CH12EDT 


1 177 


404464 


RTA00002687F.O.22. 1 .P.Seq 


F 


M00040347D:F09 


CH14EDT 


1178 


447795 


RTA00002689F.e.06.3. P.Seq 


F 


M00042S95CG0I 


CHI5CON 


1179 


18139 


RTA00002708F.f.lO.I.P.Seq 


F 


M00004139B:BIO 


CH01COH 


1 180 


403898 


RTA00002687F.a.05. 1 .P.Seq 


F 


M00039746CH06 


CH14EDT 


1181 


453512 


RTA00002693 F.a.2 1 .2.P.Seq 


F 


M00043078D:D04 


CH19COP 


1 182 


404 1 72 


RTA00002687F.d. 17.1. P.Seq 


F 


M00039951B:B12 


CH14EDT 


1183 


400973 


RTA00002685F.c.06.2.P.Seq 


F 


• M00039374C.H12 


CHI2EDT 


1184 


450198 


RTA00002691 F.e.23.2.P.Seq 


F 


M00043405A:DI 1 


CH17COHLV 


1 185 


451502 


RTA00002691F.f.03.2.P.Seq 


F 


M00043406B:G12 


CH17COHLV 


1186 


454414 


RTA00002693F.f.l8.2.P.Seq 


F 


M00043220B:C04 


CH19COP 


1187 


453752 


RTA00002693F.b.02.2.P.Seq 


F 


M00043081D:F05 


CH19COP 


1138 


403700 


RTA00002687F.g.03. 1 .P.Seq 


F 


M00040207B:D0S 


CH14EDT 


1189 


403371 


RTA00002687F.h. 19. 1 .P.Seq 


F 


M00040294D:D12 


CH14EDT 


.1190 


14583 


RTA00O02687F.f.O8. l\P.Seq 


F 


M00040203B:A05 


CHI4EDT 


1191 


404161 


RTA00002687F.e.20. 1 .P.Seq 


F 


M00039958C:B09 


CH14EDT 


1192 


403274 


RTA0000268"F.b. 10.1. P.Seq 


F 


M00039766A:G07 


CHI4EDT 


1193 


373465 


RTA00002671F.O.09.I. P.Seq 


F 


M00038615A;H12 


CH09LNL 


1194 


402582 


RTA000026S6F.m.08. 1 .P.Seq 


F 


M00040265D:C08 


CH13EDT 


1 195 


402241 


RTA000026S6F. 1.16.1. P.Seq 


F 


M00040261C:F01 


CH13EDT 


1 196 


380451 


RTA00002670F.p. 12.1. P.Seq 


F 


M0003358ID:D08 


CH09LNL 


1197 


455938 


RTA00002694F.d.24. 1. P.Seq 


F 


M00043528C:A02 


CH20COHLV 


1 198 


374297 


RTA00002672F.i.02.2.P.Seq 


F 


M00039013D:F02 


CH09LNL , 


1199 


402624 


RTA0O002636F.p. 13.1. P.Seq 


F 


M000402S7D:D07 


CH13EDT 


1200 


402322 


RTA00002686F.J. 16.1. P.Seq 


F 


M00040233A:H02 


CH13EDT 


1201 


449504 


RTA00002690F.C. 1 1 .2.P-Seq 


F 


M00042769C:E09 


CH16COP 


1202 


226704 


RTA00002664F.a. 11.1 .P.Seq 


F 


MOOO23352D:H0j 


CH04MAL 


1203 


271092 


RTA00002690F.b.23.2.P.Seq 


F 


M00042756D:A10 


CH16COP 


1204 


400864 


RTA00002685F.g.l7.2.P.Seq 


F 


M00039517B:G12 


CH12EDT 


1205 


235855 


RTA00002667F.O.06. 1 .P.Seq 


F 


M00032876C:D06 


CH08LNH 


1206 


402789 


RTA000026S6F.g. 16.1. P.Seq 


F 


M00040183A:F07 


CH13EDT 


1207 


19826 


RTA000027IOF.k.05.l. P.Seq 


F 


M00022467C:B12 


CH03MAH 


1208 


380157 


RTA00002632F.h. 19.1. P.Seq 


F 


M00039934D:G12 


CH09LNL 


1209 


401 187 


RTA00002685F.e.l5.2.P.Seq 


F 


M00039500C:C04 


CHI2EDT 


1210 


427346 


RTA00002665F.b.01.3.P.Seq 


F 


M00028066C:D07 


CH08LNH 


121 1 


402366 


RTA00002636F.C. 15.1. P.Seq 


F 


M00040I38B:H03 


CH13EDT 


1212 


376712 


RTA0000267"F.c.l3.2.P.Seq 


F 


M00039343B:F12 


CH09LNL 


1213 


401655 


RTA00002635F.C.22. 1 .P.Seq 


F 


M00039378D:H0" r 


CH12EDT 


1214 


400147 


RTA0000268 5 F.g. 10.1. P.Seq 


F 


M00059515A:A06 


CH12EDT 


1215 


400364 


RTA00002685F.g. 17.1. P.Seq 


F 


M00039517B:G12 


CH12EDT 


1216 


451600 


RTA0000269lF.b.l9.3.P.Seq 


F 


M0004332SD:H02 


CHI7COHLV( 


1217 


400147 


RTA00002685F.g.l0.2.P.Seq 


F 


M00039515A:A06 


CHI2EDT 


1218 


401655 


RTA00002685F.C.22.2. P.Seq 


F 


M0003937SD:H07 


CH12EDT 


1219 


449307 


RTA00002690F.a.l0.3.P.Seq 


F 


M00042431D:C10 


CH16COP 


1220 


403121 


RTA00002638F.a.OI.2.P.Seq 


F 


M00040366A:BOI 


CH14EDT ] 


122! 


451713 


RTA000026«2F.e.24.1. P.Seq 


F 


M00043044B:A12 


CHI SCON 


1222 


294345 


RTA000026S5F-. 14.1. P.Seq 


F 


M00039515D:C1 1 


CH12EDT 



WO 01/02568 



PCT/US00/18374 



SEQ 



1224 



1225 



1226 



1227 



1228 



1229 



1230 



1231 



1232 



12 



j j 



1234 



1235 



1236 



1237 



1238 



1239 



1240 



1241 



1242 



124: 



1244 



1245 



1246 
1247 



1243 



1249 



1250 
1251 



1252 



1253 



1254 



1255 



1256 



1257 



1258 



1259 



1260 



1261 



1262 



1263 



1264 



1265 



1266 



1267 



1268 



1269 



CLUSTER 
186541 



403898 



403541 



45077: 



376236 



422357 



404532 



40369: 



403693 



401515 



404532 



45207" 



18003 



377014 



404232 



404232 



406263 



452077 



454349 



447671 



44760! 



456764 



40 1 827 



404520 



449798 



45099: 



377471 



400404 



19106 



404024 



446404 



392921 



376850 



453011 



2343! 1 



402708 



45101 



45301 I 



80462 



379602 



403896 



40339' 



271723 



451379 



456624 



7548: 



402229 



SEQ NAME 
RTAO00027l2F.p.23.2.P.Seq 



RTA00002687F.a.05.2.P.Seq 



RTA00002687F.p.2a LP.Seq 



RTA0000269lF.d.24.3.P.Sea 



RTA00002685F.I.24.2.P.Seq 



RTA00002683F.C.21. LP.Seq 



RTA00002687F.p. l0.2.P.Seq 



ORIENTATION 
F 



RTA00002687F.J. 23. LP.Seq 



RTA00002687F.j.23.2.P.Seq 



RTA00002685F.o.02.2.P.Seq 



RTA00002687F.p. 1 0. LP.Seq 



RTAQ0002692F.d.0L2.P.Seq 



RTA0000271 1 F.b.04. LP.Seq 



RTA00002682F.f. 13. LP.Seq 



RTA000026S7F.n. !2.2.P.Seq 



RTA0O0O2687F.n.l2.LP.Seq 



RTA00002635F.d.l4.LP.Sea 



RTA00002692F.c.24.2.P.Seq 



RTA00002693F.c.09.2.P.Seq 



RTA0OOO2689F.e. 12. LP.Seq 



RTA00002693F.b.l4.2.P.Sea 



RTA00002694F.C. 14. LP.Seq | 



RTA00002636F.L19. LP.Seq 



RTA0O002687F.f.05. l.P.Sec 



RTA00002691F.d.02.3.P.Seq | 



RTA0000269IF.c.l2.3.P.Seq 



RTA0000269lF.c.02.3.P.Seq 



CLONE ID 
M0OO27ISiD:A05 



iV100039746C:H06 CH14EDT 



M00040364A:E05 CHI4EDT 



M00043383D:A02 CHI7COHLV 



M0003959;C:E05 



CHI2EDT 



M00040385C:D02 



CHI4EDT 



M0004035I3:F02 



CH14EDT 



M000403I7D:F02 



CH14EDT 



M00040317D:F02 



CH14EDT 



M0003962-iB:FI2 



CH12EDT 



M0004035i3:F02 



CHI4EDT 



M00043002A:E05 



CHI SCON 



M00022821CC09 CH03MAH 



M00039973D:C08 CH09LNL 



M00040334D:C07 



CH14EDT 



M00040334D:C07 



CH14EDT 



M00039493A:C04 



CH12EDT 



iVI0004300ZA:E05 



CHI8CON 



M00043 I 33 3:C I 1 



CH19COP 



M00042904B:E07 



CH15CON 



M00043095A:F09 



CH19COP 



M000434653:H02 CH20COHLV 



M00040262B:B06 



CH13EDT 



M00040202A:F0: 



CH14EDT 



M00043366A:A02 CHI7COHLV 



M00043350D:B1I CH 1 7COHLV 



M00043339A:F1 I CHI 7COHLV 



RTA000026S6F.a. 1 7. LP.Seq 



RTA0000269IF.e.08.2.P.Seq 



RTA00002687F.e. 18. LP.Seq 



RTA00002689F.b. 14. LP.Seq 



RTA00002677F.k.l2.2.P.Seq 



RTA00002673F.e. 10.2.P.Seq 



RTA00002692F.fl0.2.P.Seq 



RTA00002691F.a.03.3.P.Seq 



RTA000026S6F.m. 1 1. LP.Seq 



RTA0000269IF.f.08.2.P.Seq 



RTA00002692F.f. 1 0. 1 .P.Selff 



F 



RTA00002670F.n.24.2.P.SeqT 



M0003975:B:G08 



CHI3EDT 



M0004333SC:E03 CH17COHLV 



MUUUj^SA.AOS CH I 4EDT 



M00042566C.C05 



CH15CON 



M000J94! 1CE07 



CH09LNL 



M00039458B:HI1 



CH09LNL 



M00043066B:HI I 



CHI SCON 



M000423?:D:C01 CH 17COHLV 



M0004026-A:E06 



CH13EDT 



M0004340^B:BO3 CH 17COHLV 



M00043066B:H11 



M000335"OB:E06 



CHI SCON 
CH09LNL 



RTA000026S I F.c.2 1 .2. P.Seq 



RTA00002687F.a-04. 1 .P.Seq 



RTA00002687F.h.02. LP.Seq 



RTA0O0O2686F.b.05. 1 .P.Seq 



RTA0000269lF.b.l2.2. P.Seq 



RTA00002694F.e.02. LP.Seq 



RTA00O02636F.n. 14. LP.Seq 



RTA00002686F.i.09. LP.Seq 



M00039S55C.F01 



CH09LNL 



M00039746C.H05 



CH14EDT 



M00040219B:D02 



M00039755A:308 



CH14EDT 
CHI3EDT 



M000433i2C:E08 CHI 7COHLV 



M000436163:F02 CH20COHLV 



M00040:^4A:D07 



M00040::iA:GlI 



CHI3EDT 
CHI3EDT 



WO 01/02568 



PCT/USOO/18374 



SEQ 
!D 


CLUSTER 


SEO NAME 


ORIENTATION 


clone ;d 


LIBRARY 


1270 


377039 


RTA00002686F. 0.12.1. P.Seq 


F 


M000402SOCH05 


CH13EDT 


1271 


18041 


RTA000027 1 OF.h.2 U .P.Seq 


F 


M00022262DG03 


CH03MAH 


1272 


401381 


RTA00002685F.O.08. 1 .P.Seq 


F 


M00039626D:F04 


CH12EDT 


1273 


428491 


RTA00002666F.C.05.I. P.Seq 


F 


M00032535D:H01 


CH08LNH 


1274 


54656 


RTA0000266IF.i.* , r\P.Seq 


F 


M000043"2B:F07 


CHOICOH 


1275 


379183 


RTA00002679F.L 1 7 A .P.Seq 


F 


M00039638C.G06 


CH09LNL 


1276 


25594 


RTA0000271 IF. f.07.1. P.Seq 


F 


M00022963B:E02 


CH03MAH 


1277 


403355 


RTA00002687F.d.l 1.1. P.Seq 


F 


M00039948D:Dll 


CH14EDT 


1278 


16789 


RTA00002709F.b.09.2.P.Seq 


F 


M00005332B:F08 


CH02COH 


1279 


23292 


RTA00002708F.C.02. 1 .P.Seq 


F 


M00003750D:E06 


CHOICOH 


1280 


373982 


RTA00002673F.b.24.2.P.Seq 


F 


M00039058A:A04 


CH09LNL 


1281 


373982 


RTA00002673F.C.0 1.2. P.Seq 


F 


M000390:8A:A04 


CH09LNL 


1282 


449911 


RTA0000269 1 F.e.02.2.P.Seq 


F 


M0004338^B:B02 


CHI 7C0HLV 


1283 


450633 


RTA0000269 1 F.f.02.2,P.Seq 


F 


M00043405C:GI2 


CH17C0HLV 


1284 


23939 


RTA0000271 3 Fj. 14.1. P.Seq 


F 


M000274S6A:F06 


CH04MAL 


1285 


450633 


RTA00002691F.f.02.1. P.Seq | F 


M00043405C:G12 


CH17C0HLV 


1286 


379122 


RTA00002672F.n. 1 4. 1 .P.Seq | F 


M000390:^B:F09 


CH09LNL 


1287 


449429 


RTA00002690F.a. 16. 3. P.Seq 


F 


M000424"A:D04 


CH16C0P 


1288 


430578 


RTA00002668F 2. 18.1. P.Seq 


F 


M000329S-iC:G05 


CH08LNH 


1289 


425824 


RTA00002687F.0. 1 7. 1 .P.Seq 


F 


M0003976"C:E12 


CH14EDT 


1290 


425824 


RTA00002687F.b.t7.2.P.Seq 


F 


M0003976-C:E12 


CH14EDT 


1291 


401266 


RTA0O002685F.U l.2.P.Seq 


F 


M00039535Q:DIO 


CH12EDT 


1292 


377949 


RTA00002674F.p.04. 1 P.Seq 


F 


M00O3920OA.CI0 


CH09LNL 


1293 


12926 


RTA000027l0F.e.2 1.1. P.Seq 


F 


M00022005C.C06 


CH03MAH 


1294 


378242 


RTA00002679F.c.20.2.P.Seq 


F 


M0003966-D:G07 


CH09LNL 


1295 


401781 


RTA00002686F.e.03. 1 .P.Seq 


F 


M00040160B:AIO 


CH13EDT 


1296 


453101 


RTA00002693F.C. 16.2.P.Seq | F 


M00043 U33 A 10 


CHI9C0P 


1297 


377592 


RTA00002677F.U2.2.P.Sea j F 


M000394i5D:E01 


CH09LNL 


1298 


404340 


RTA00002687F.b.05. 1 .P.Seq \ F 


M0003976-C.D07 


CH14EDT 


1299 


400968 


RTA00002685F.h.0l.2.P.Seq j F 


M0003952:D:H03 


CH12EDT 


1300 


400968 


RTA00002635F.2.24.2. P.Seq 


F 


M000395ZID:H03 


CH12EDT 


1301 


374417 


RTA0000267IF.J.1 5.3. P.Seq 


F 


M0003S3i5C:Gl 1 


CH09LNL 


1302 


374621 


RTA00002675F.p,02. 1 .P.Seq 


F 


V100039263DAI2 


CH09LNL 


1303 


19063 


RTA00002708F.i. 14. [.P.Seq 


F 


M0000436;.A:H02 


CHOICOH 


1304 


135941 


RTA000027 1 3F.2.06. 1 P.Seq | F 


M000273f-3:G05 


CH04MAL 


1305 


403355 


RTA00002687F.d. 1 1 .2. P.Seq | F 


M0003994$O:DI I 


CHI4EDT 


1306 


375226 


RTA00002677F.m.08.2. P.Seq 


F 


M000394i-C:A01 


CH09LNL 


1307 


. 222653 


RTA00002664F.e.l4.2.P.Seq 


F 


M00027iO.'3:A09 


CH04MAL 


1308 


447978 


RTA0000:6O0F.d.l 1.3. P.Seq 


F 


M00042800A:A03 


CH16C0P 


1309 


431346 


RTA0000266 t >F.2.24. 1 .P.Seq | F 


M000332iSA:C04 


CH08LNH 


1310 


455579 


RTA00002694F.a. 10 1. P.Seq j F 


M00042:>°co:r06 


CH20COHLV 


1311 


13406 


RTA00002709F.1. 1 4. 1 .P.Seq j F 


M0000712-DH10 


CH02COH 


i3i: 


378364 


RTA0000:674F.o. 1 7. 1 .P.Sea | F 

* * 


MOOOj^I^cD A07 


CH09DNL 


1313 


373788 
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LIBRARY 


251 


23954 


RT A00002 8 94 F. k . 06 . 1 . P . Se q 


F 


M00G04036B:All 


CHOICOH 


252 


12399 


RTA0000291SF.n. 10. l.P.Seq 


F 


M00032985D:G09 


CH08LNH 


253 


30853 


RTA00002907F.j. 16.2. P.Seq 


F 


M00022216D:D10 


CH03MAH 


254 


8615 


RTA00002907F.I.17.2.P.Seq 


F 


M00022238GG04 


CH03MAH 


255 


142359 


RTA00002905F.C. 10. 1 .P.Seq 


F 


M000079SOB.A07 


CH03MAH 


256 


9565 


RT A0000292 6F. 2.08.2. P.Seq 


F 


M00040127CD02 


CH09LNL 


257 


17334 


RT A00002 902 F. d . 03 . 1 . P . Seq 


F 


M000O6582D:A09 


CH02COH 


258 


12540 


RTA00002SS6F.f.03. 1. P.Seq 


F 


M00001349GB04 


CHOICOH 


259 


17289 


RTA00002926F.f. 14.2. P.Seq 


F 


M0004011SD:C05 


CH09LNL 


260 


46798 


RTA00002907F.i.l3.2.P.Seq 


F 


M00022202C:C04 


CH03MAH 


261 


7797 


RTA00002905F.b.06. 1. P.Seq 


F 


M00007953D:F07 


CH03MAH 


262 


1945 


RTA00002S96F.p.04. l.P.Seq 


F 


M00004201D:C03 


CHOICOH 


263 


60S4 


RTA00002S96F.O.02.1. P.Seq 


F 


M00004195A:F07 


CHOICOH 


264 


6091 


RTA00002930F.C.03. l.P.Seq 


F 


M00042915B:Gll 


CH15C0N 


265 


186105 


RTA00002930F.C. 10. l.P.Seq 


F 


M00055430A:A01 


CH15C0N 


266 


11341 


RTA00002930F.h.07. l.P.Seq 


F 


M00055961C:B10 


CH15C0N 


267 


2520 


RTA00002930F.C. 10. l.P.Seq 


F 


M00055639A:E06 


CH15C0N 


268 


136735 


RTA00002903F.k.06. l.P.Seq 


F 


M00007006C:C12 


CH02COH 


269 


8336 


RTA00002900F.e.20. 1 .P.Seq 


F 


M00004873B:G04 


CH02COH 


270 


13926 


RT A00002907F . h . 1 9 . 1 . P.Seq 


F 


M00022193B:A09 


CH03MAH 


271 


11119 


RTA00002906F.k.01. l.P.Seq 


F 


M00022009GA08 


CH03MAH 


272 


11119 


RTA00002906F.J.24. l.P.Seq 


F 


M00022009C:A08 


CH03MAH 


273 


11726 


RTA00002906F.1.07. 1 .P.Seq 


F 


M00022051B.-D07 


CH03MAH 


274 


6799 


RTA00002925F.2.2 1.1. P.Seq | F 


M000398S5C:Dll 


CH09LNL 


275 


17266 


RTAO0002S89F.g.O9. l.P.Seq 


F 


M00001544B:B05 


CHOICOH 


276 


9479 


RTA00O02924F.g.O4. l.P.Seq 


F 


M00039560B:G09 


CH09LNL 


277 


185557 


RTA000029 l2F.j. 1 3. 1 .P.Seq 


F 


M00027457B:Ell 


CH04MAL 


278 


27872 


RTA00002906F.e. 14. l.P.Seq 


F 


M0002i923D:H02 


CH03MAH 


279 


15513 


RT A00002924F. a . 2 1 . 1 . P . Seq 


F 


MO0O39617GA1O 


CH09LNL 


280 


4446 


RTA00002S9 lF.m. 15. 1 .P.Seq 


F 


M00003773A:F10 


CH01COH 


281 


1681 


RTA000029l6F.g.07. l.P.Seq 


F 


M00032577D:F01 


CH08LNH 


282 


24243 


RTA00002S87F.n. 13.1 .P.Seq 


F 


M00001424D:D02 


CHOICOH 


283 


16049 


RTA00002900F.C.1 I'. l.P.Seq 


F 


M00004846A:A10 


CH02COH 


284 


186267 


RTAO0O0291OF.h.ll.l.P.Seq 


F 


M00022924B:.A05 


CH03MAH 


285 


4543 


RTA00002925F.h.22. l.P.Seq 


F 


M00039895DG04 


CH09LNL 


286 


6176 


RT A000029 1 4F. d. 2 3 . l.P.Seq 


F 


M0002S18SC:H11 


CH08LNH 


287 


29043 


RT A00002906F.h. 1 7. 1 .P.Seq 


F 


M00021974D:F0l 


CH03MAH 


288 


696 


RTA00002922F.0. 15. l.P.Seq 


F 


M00039143A:F04 


CH09LNL 


289 


7225 


RTA00002S91F.1.22. l.P.Seq 


F 


M00003770GAIO 


CHOICOH 


290 


25609 


RT A00002 S9 9 F. h . 15.1. P.Seq 


F 


M00004608A:H04 


CHOICOH 


291 


6295 


RTA00002922F.O.24. 1 .P.Seq 


F 


M00039146B:G04 


CH09LNL 


292 


186319 


RTA000029 1 2F.e .01. l.P.Seq 


F 


M00027290C;F06 


CH04MAL 


293 


4539 


RTAO0O02889F.d.O4. l.P.Seq 


F 


M00001534GE07 


CHOICOH 


294 


17841 


RTA00002S9 lF.m.06. l.P.Seq 


F 


M00003771D:A03 


CHOICOH 


295 


13720 


RTA00002924F.C.05. l.P.Seq 


F 


M00039430A:E04 


CH09LNL 


296 


7300 


RTA00002925F.a. 14. 1 .P.Seq 


F 


M00039S06B:D05 


CH09LNL 


297 


186280 


RT A000029 1 2F.t*. 13. l.P.Seq 


F 


M000273L6GC03 


CH04M.AL 


298 


185585 


RTA00002912F.n.04. l.P.Seq 


F 


M00027569A:E05 


CH04M.AL 


299 


3447 


"RTAO0OO29OOF.1. 1 1 . 1 -P.Seq 


F 


M00005364B:E10 


CH02COH 


300 


14487 


RTA00002SS9F.M9. l.P.Seq 


F 


M0000154:B:F09 


CHOICOH 



WO 01/02568 
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ShQ 














ID 


CLUSTER 




SEQ NAME 


ORIENTATION 


CLONE ID 


\ TRR AD Y 


301 
302 


5338 

7766 




RTA0000290IF.g 16.1. P.Seq 
RTA000029 1 7F.e.07. 1 .P.Seq 


F 

F 

a 


MOOOO55O5A:F01 


^ CH02COH 
CH08LNH 


303 
304 
305 
306 
307 
308 


7450 
15369 
4954 
17189 
186561 
44645 




RTA00002S99F.g.Q9. LP.Seq 
RTAO00O29OSF.m.l8.1.P.Sec 
RTAQ0002919Ri.17.LP.Seq 
RTA00002900Ri.l LI .P.Seq 
RTA000029 12F.m.23. i .P.Seq 
RTA00002S96Rh.22. l.P.Seq 


F 

\ F 
F 

F 1 
F 

F 1 

A 1 


M00004509D:C06 

M00033150C:A11 
MooofKiViry nna 1 

I'lVflJVvJJJ jU>UUO 1 

MOOfP 7 S4QP • nm 1 

l»lwulH 1 O _)V_ . ,-\ 1 1 | 


CH01COH 
LHOjMAH 
CH08LNH 

i-HUlLUH 


309 
310 


11404 
38212 




RTA00002924F.a.24. LP.Seq 
RTA00002893F.m.22. l.P.Seq 


F 1 
F I 

a t 


Moornoa i "ir-pnA 1 

I'lUuu J 74 I _J V^. - CUD I 

Monnn^Qd'? a • nn i 1 


/■"liaoi \rr 1 
IrlUyHNL | 


311 
312 


22099 
25041 




RTA000O2S90Rm.O9. 1 .P.Seq 
RTA00002S90F.p. 12. l.P.Seq 


F 

F 1 

4 1 


MQ000164SA:D10 
i*i i oo i lj . r uo | 


CH01COH 

L.rlUlv„Un | 


313 
314 


185938 . 
9414 




RTA00002911Rp,01.LP.Seq 
RTA00002908F.O.06. LP.Seq 


F 1 

f" i 


MOflfP7 1 7T-F 1 1 

MfiOfp^ooR n i 1 T 


L.rlU'tivlAL. | 

r^uniX/f a li 1 
^rlUjMAJi 1 


315 


185707 




RTA0000291 lF.o. 19. LP.Seq 


F 


M00027167C:B10 


CH04MAL 


316 


185499 




RTA000029 1 2F.n. 19. 1 .P.Seq^ 


F 


M000275S6A.C09 


CH04MAJL 


317 


25704 




RT A000029 1 2Rn.22. 1 .P.Seq 


F 

A I 


Mnon^^QR-rifn 1 


L.rlU4MAL [ 


31S 


21068 




RTA00002896F.h. 1 8. LP.Seq 


F 




LH01LOH j 


319 
320 


13440 
3907 




RTAOOOQ29 1 7Re. 1 8. 1 .P.Seq 
RTAO0002923RL 18. LP.Seq 


F 

F 1 
i i 


M0OO3271 1B:F01 


CH08LNH 
LH09LNL | 


321 


21391 




RTA00002S96Rg.03. LP.Seq 


F 


M0000415SDEOS 


CH0ICOH 


322 


6755 




RTA000029 1 SF.1.0 1. LP.Seq 


F 


M00032944A:B07 


CH08LNH 


323 
324 
325 


155939 
8100 
4785 




RTA00002907F.j.23. LP.Seq 
RTA00002S96Rg.2 1 . 1 .P.Seq 
RTA00002919FJ. IS. l.P.Seq 


F 

A 1 

F 1 

A 1 

F 1 

A 1 


^ 1 
^ 1 


L rlU j M AH | 

C< nn 1 r~T\\i 1 
CHUlLUri | 


326 


14947 




RT A0O0O29O2 F.k.23 . 1 .P.Seq 


F 


Mnnnn^7iii a nru. 1 




327 
328 


8295 
156277 


RTAOOOQ2903F.k.23.1.P.Seq 
RTA00002907F.1. 13. LP.Seq 


F 

F f 


Monornm i a*fpp I 

I* V\J\J\J\J 1 \Jj i .-\ . CI U_ 1 

M00022237D-.D06 


CH03MAH 


329 
330+ 


22751 




RTA00002S97F.1. 15. l.P.Seq 


F 1 






331 


;869 
156009 


Kl AUUUUJ91/K|. 13. l.P.Seq 
RTA000O2907F.k.05. LP.Seq 


j. L 

A 1 

F T 


M00022220A:A07 


CH03M.AH 


332 


9453 


RTAO00O29O7F.k.2 1 . 1 .P.Seq 


F 

A 1 


lvnjUU— — — . O O . D 1 1 1 


CrlUjivl.Arl | 


i t -» i 
[ Jjj | 


1 86052 


RTA000O29l2Rh.08.LP.Seq 


F 

A 1 


M00fP7"^JR - P I "* 1 


v_HU4i\lAL J 


' 334 


669 


RTA000029l7Rf.22.LP.Seq 


F 


MOOO32723D:H02 


CH08LNH 


335 


11609 


RTA00002S99Rf.23. LP.Seq 


F 


Monnnj.^fY7rvcn^ 1 


v_HUlL.Uri | 


336 

1 337 


186075 
935 


RTA00002911F.k.l9.1.P.Seq 
RTA000O29 1 1F.1.20. LP.Seq " 


F 

A, 1 

F 1 

A 1 


Mnno'>7ns7r nm 
^foni^7fiTiT^T7KT~ 


LrlWMAL | 
V-HU4MAL | 


338 


11430 


RTA0O0O2S92F.e.07.2.P.Sea 


F [ 


\ fonnn r • n ri7 


LHU1CUH | 


339 
340 


,185938 
12394 


RTA0Q002911F.O.24. l.P.Seq 
RTA00002915F.m.l5.2.P.Sea " 


F 

i i 
F 1 

A 1 


xmcin^i i i\c r 1 1 1 


lriU4iylAL | 
pnno r \rr_r 1 


341 


186588 


RTA00002911R1.03.LP.Seq 


F 




LriU4i\lAL | 


342 
343 


23174 
4727 


RTA00002909F.e. 17. LP.Seq 
RTA00002905F.g.l9.L.P.Seq " 


F 

F J 


M000 oo 600DB0^ 
M0000S059D:30S 


v- nu j lyi ,-\ n i 

CH03M.AH 


344 


17048 


RTA00002SS7F.1. 10. 1. P.Seq 


F 


M000014I6B:A05 


CH01CQH 


345 


2354 


RT A000029 1 6F.O.03. i .P.Seq 


F 


M00032645D:C01 


CH08LNTI 


346 


19S7 


RT A00002S94F.a. 1 3. 1 .P.Seq 


F 


M00003974D.E02 


CH01COH 


347 


244S3 


RTAO00O2S97F.1.2 1 . 1 .P.Se^ 


F 


M00004269A:Gii | 


CH01COH 


348 


33337 


RTAO00O2S96F.t'.O8. LP.Seq 


F 


M00004155A:K03 


CH01COH 


349 


11641 


RTA000029 loF.m. !9. 1 .P.Seq 


F 


M00032637A:F09 


CH0SLNH 


350 1 


10307 | 


RTAO00O2910F.I.O1. LP.Seq 


F 1 


M00022995C :G0" 1 


CH03MAH 



us 



WO 01/02568 



PCT/US00/18374 



ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


351 


20388 


RTA00002906F.a.04.l.P.Sea 


F 


M00021700D:H03 


CH03MAH 


352 


24687 


RTA00002903F.m.02. LP.Seq 


F 


M00007043B:E11 


CH02COH 


353 


10414 


RTA00002919F.n.l9.1.P.Seq 


F 


M00033232B:C03 


CH08LNH 


354 


11058 


RTA00002S92F.h. l6.2.P.Seq 


F 


M00003S20B:F1 1 


CHOICOH 


355 


6574 


RTA00002917F.o.l7.1.P.Seq 


F 


M00032797D:D08 


CH08LNH 


356 


18782 


RTA0O0O2905F.f.O7. LP.Seq 


F 


M00008021C.G12 


CH03MAH 


357 


35896 


RTA00002896F.d.04. 1 -P.Seq 


F 


M00004146CB04 


CHOICOH 


358 


3513 


RTA00002930F.j. 10. LP.Seq 


F 


M00056217D:E10 


CH15CON 


359 


8820 


RTA000029l5F.f.l7.2.PSeq 


F 


M00028782A:F01 


CH08LNH 


360 


10208 


RTA00002S97F.h. 08. LP.Seq 


. F 


M00004251D:D03 


CHOICOH 


361 


2089 


RTA00002S96F.g. 14. LP.Seq 


F 


M00004159D:F12 


CHOICOH 


362 


170919 


RTAOO0O2909F.p.O3. LP.Seq 


F 


M00022727A:G01 


CH03MAH 


363 


8727 


RTA000029 17F.O.02. LP.Seq 


F 


M00032791B:H11 


CHOSLNH 


364 


33184 


RTA000O289SF.d.O8. LP.Seq 


F 


M00004324A:D10 


CHOICOH 


365 


27973 


RTA00002905F.B. 13, LP.Seq 


F 


M00008055D:G03 


CH03MAH 


366 


15835 


RTA00002897F.k. 1 3. LP.Seq 


F 


M0000427SOB10 


CHOICOH 


367 


10273 


RTAO00O29O3F.n.O3. 1 .P.Seq 


F 


M00007081B:E09 


CH02COH 


363 


2332 


RTA00002899F.L 03. LP.Seq 


F 


M00004502A:D12 


" CHOICOH 


369 


32022 


RTA00002903F.m. 12. LP.Seq 


F 


M00007060D:G07 


CH02COH 


370 


68176 


RTA00002893F.2. ILL .P.Seq 


F 


M00003S98C:A0l 


CHOICOH 


371 


29378 


RTA000029 L5F.n. L4.2.P.Seq 


F 


M00032508A:E03 


CHOSLNH 


372 


23235 


RTA00002925F.k.02. LP.Seq 


F 


M00039929B:E06 


CH09LNL 


373 


12111 


RTA00002895F.0. 17. 1 .P.Seq 


F 


M00004122C:D0l 


CHOICOH 


374 


5737 


RTA00002924F.k.02. 1 .P.Seq 


F 


M00039672C:D05 


CH09LNL 


375 


72475 


RTA000029 15F.1. 15. 1 .P.Seq 


F' 


M00032490D:E08 


CHOSLNH 


376 


7027 


RT A00002907F.O.0 1. LP.Seq 


F 


M00022264A:B02 


CH03MAH 


377 


17165 


RTAO00029O3F.d. 19. LP.Seq 


F 


M00006907A:C09 


CH02COH 


378 


26446 


RTA00002894F.m. 17. LP.Seq 


F 


M00004047C:B09 


CHOICOH 


379 


6755 


RTA000029 1 SF.k.24. LP.Seq 


F 


MC0032944A:B07 


CHOSLNH 


380 


9336 


RTAO00029O9F.n.O2. LP.Seq 


F 


M00022703D:Bll 


CH03MAH 


381 


6960 


RTA000029 1 6F.O.0S. 1 .P.Seq 


F 


M00032647B:F06 


CHOSLNH 


382 


472 


RTA000029 1 1 F.g.0 1 . LP.Seq 


F 


M00026936D:C07 


CH04MAL 


383 


9460 


RTA00002908F.C.03. LP.Seq 


F 


MO0O22376D:D05 


CH03MAH 


384 


10307 


RTA000029 lOF.k.24. 1 .P.Seq 


F 


M00022995C:G07 


CH03MAH 


385 


4623 


RTA00002923F.d.22. LP.Seq 


F 


M00039222B:A04 


CH09LNL 


386 


141167 


RTA00002905F.C.09. 1 .P.Seq 


F 


M000079S0A:B01 


CH03MAH 


387 


34011 


RTA00002S98F. m. 10. LP.Seq 


F 


M000043S5C:H12 


CHOICOH 


388 


5965 


RTA000029 1 5F.a.07. LP.Seq 


F 


M0002S620C:C07 


CHOSLNH 


389 


12336 


RTA0O002915F. 2.04. LP.Seq 


F 


M00O2S7S4A:DL2 


CHOSLNH 


390 


36492 


RT A00002 89 3 F. f.l S . L P .Seq 


F 


M00003891B:H02 


CHOICOH 


391 


29803 


RTA00002908F.k.06. LP.Seq 


F 


M00022467D:B03 


CH03MAH 


392 


4420 


RTA0OOO292OF.Q. 15. LP.Seq 


F 


M00033326B:B05 


CHOSLNH 


393 


15097 


RTA00002923F.b.06. LP.Seq 


F 


MOOOj917^A:F01 


CH09LNL 


394 


19133 


RTA00002S94F. g.03. LP.Seq 


F 


M00003993C:D07 


CHOICOH 


395 


9810 


RTA00002905F.C.03. 1 P.Seq 


F 


M00007975C:AIO 


CH03MAH 


396 


31562 


RTA000O2S97F.a.O9. LP.Seq 


F 


M00004210A:A03 


CHOICOH 


397 


1499 


RTA000029l2F.k.llLP.Seq 


F 


M00027475D:A01 


CH04M.AL 


398 


29531 


RTA00002907F.O.05. LP.Seq 


F 


M00022265A:F11 


CH03MAH 


399 


4237 


RTA000029 1 SF.j.20. LP.Seq 


F 


M0003292SC:D02 


CHOSLNH 


400 


28660 


RTA0OOO2905F.p. 1 L LP.Seq 


F 


M00021690A:C03 


CH03MAH 



116 



WO 01/02568 



PCT/US00/18374 



i 



1 SEQ 
ID 


CI USTFR 




UKltr* 1 A 1 iUI> 




LIBR.ARY 


| -+U 1 


4jV0 


K 1 AU<JUU_o9or.i.2 1. l.P.Sea 


. F 


M00004j41C:E0d 


CH01COH 


1 im 

1 4U- 


T 1 "7*7,1 
Z 1 / /4 


R I AU0U(J_398r.c.20. 1 .P.Seq 


F 


M00004322B:D03 


CH01COH 


i zim 

|_ 4U J 


^ A 1 1 


R I A 000029 br.c. 1 J. 1 .P.Seq 


F 


M0002S7/4D:E10 


CH08LNH 


1 AHA 
[ 4U4 


/UiU 


RTA00002S94F.1. 1 j. 1 .P.Seq 


F 


M00004042B:A1 1 


CH01COH 


\ 4U0 


1 1 / JO 


KTA00002S98F.C.09. 1. P.Seq 


F 


M00004330A:A01 


CH01COH 


1 AHA. 


y4/j2 


RTA000029l0F.e. 17. 1 .P.Seq 


F 


' M00022856D:A07 


CH03M.AH 


| 4U / 


JU2S3 


RTA0000292 jF.g. 19. 1 .P.Seq 


F 


M00039255D:B0l 


CH09LNL 


| 4Uo 


1-Zy779 


RT A00002904F.a. 1 8. 1 .P.Seq 


F 


M00007155C.D07 


CH02COH 


L 4U7 


4oJj 


RTA00002900F.J.2 1 . 1 .P.Seq 


F 


M00005349C:C02 


CH02COH 


[ 41 u 




RTA00002S9jF.L08.1. P.Seq 




M00003888B:F09 


CH01COH 




1 1010£ 

1 lVZUO 


RTA00002903r.m. 16. 1. P.Seq 


F 


M00021650D:A1 1 


CH03MAH 


| 4 I — 


OV40 


K 1 AUU00J9jUr.a. 19.2.P.Sec 


F 


M000338S0B:H10 


CHbCON 


1 d 1 1 


4Z40Z ■ 


RTA00002902F.r. 12. 1 .P.Seq 


F 


M00006631D:D02 


CH02COH 


| 4 i 4 




KTAO0OQ2S96F.m. 17.1 .P.Seq 


F 


M00004189A;C12 


CH01COH 


| 41 J 


1 lien 


RTA00002901F.a. 17. l.P.Seq 


F 


M00005423C:D07 


CH02COH 


1 4 1 A 
[ 4 I 0 


i /ujy 


R I A00002896F.1. 14. 1 .P.Seq 


*-* 

F 


M00004169A:E04 


CH01COH 


1 /i n 
| 41/ 


14397 


RTA00002896F.J. 11.1 .P.Seq 


F 


M00004172D:B12 


CH01COH 


1 /I 1 0 

[ 413 


1 A "» C 1 

14j51 


RTA0000288SF.C.2 1. 1 .P.Seq 


F 


M00001444C:DU 


CH01COH 


1 a i n 
| 419 


5^79 


RTA0000289 jF.j. 1 1 . 1 .P.Seq 


F 


M00003914A:A0S 


CH01COH 


1 /ion 
| 4JU 


24186 


RTA000029 14F.n.02. 1 .P.Seq 


F 


M00028366B:B08 


CH0SLNH 


421 


1 1 All 


K 1 AUUUU-9J 1 r.C.Uo. 1 .r.oeq 


F 


M000jjj42B:F0j 


CH09LNL | 


| 42i 


1366j5 


RTA0000291 1 F.t. 06. l.P.Seq 


F 


M00026907D:E07 


CH04MAL 


| 4Z J 


595D 


RTA000029 1 DF.d. 1 5. 1 .P.Seq 


F 


M00028771A:E02 


CH0SLNH 


| 4i4 


tin; -> 

220^3 


RTA00002b94F.k.09. 1 .P.Seq 


F 


M00004036D:C12 


CH01COH 


I Al< 

\ 4^J 


men 

9259 


RTA000029 1 8F.b.09. 1 .P.Seq 


F 


MO0O32S30D:DO2 


CH0SLNH 


\ 4Z0 


2D4j7 


RTA0000290dF.o,2j. l.P.Seq 


F 


M00021681C:C09 


CH03M.AH 


1 All 
1 4- / 


O /I o o 

8438 


RTA00002916F.i.02.l.P.Seq 


F 


M00032590B:H01 


CH0SLNH 


1 A1Q 

4Zo 


4334 


RTA00002919F.0. 12. l.P.Seq 


F 


M0003324SD:H1 1 


CH0SLNH 


[ 4ZV 

' /en 
4jU 


O. OA ,4 

9304 


RTA0000j9 1 Dr.c. 19. 1 .P.Seq 


F 


M0002S764B:D03 


CH0SLNH 


i inn .r t 

1799D4 


RTA000029 10F.|.04. 1 .P.Seq 


F 


M00022964A:B03 


CH03M.\H 1 


4 J 1 


136532 


RTA000029 1 2F.a.0 1 . 1 .P.Seq 


F 


M00027189C:B10 


CH04M.AL 


4 J-i 


i i n i c 


RTA00002894F.1.13. l.P.Seq 


F 


M00004029D:A01 


CH01COH 


4 J J 


8824 


RTA0000290 jF.d. 1 / . 1 .P.Seq 


F 


M00006S77B:C09 


CH02COH 


4j4 


406j 


RTA0000-9 1 6F. k.0 1 . 1 .P.Seq 


F 


M00032613A;E1 1 


CH0SLNH 


4 J J 


/9o4 


RTA00002896F.1. 18. l.P.Seq 


F 


M00004170A:F03 


CH0ICOH 


A1 A 
4j0 


9233 


RTA000029 loF.|.20. 1 .P.Seq 


F 


M00032473B:A03 


CH0SLNH 


All 
4 J / 


Z34 1 


RTA0000-9l4F.t. 1^. l.P.Seq 


F 


M0002S196A:G03 


CH0SLNH 


4JO 


1 12Uj 


RTA000025S6F.p. 16. 1 .P.Seq 


F 


MO0O013S2D:HO8 


CH01COH 


4jV 


8300 


RTA0000 2 6 S S F.c . 20. 1 . P. Seq 


F 


M00001444B:E04 


CH01COH 


/i /in 
44U 


3224 


RTA000029 1 6F.d.2.v I .P.Seq 


F 


M00032556D:A03 


CH0SLNH 


/t A 1 

441 


95423 


RTA000O-909F.K. 24. l.P.Seq 


F 


M00022674C:H08 


CH03MAH 


440 


791 1 




c 


ivluuuhuu /yu. uuy 




443 


8S052 


RTA00002925F.p. 11.1 .P.Seq 


F 


M00040041D:F01 


CH09LNL 


444 


32736 


RTA0O0O290OF.L20. 1 .P.Seq 


F 


M00005367D:All 


CH02COH 1 


445 


20811 


RTA00002S96F.n. 14. l.P.Seq 


F 


M00004192C.B06 


CH01COH 1 


446 


12856 


RTA0000290SF.b.07. 1 .P.Seq 


F 


M000223CSA:31 1 


CH03MAH 


447 


12190 


RTA0O0O2S99F.b. 10. 1 .P.Seq 


F 


M00004430B:310 


CHOiCOH 


448 


10546 


RTA0000290 1 F.o.OS. 1 .P.Seq 


F 


M000056S9C:B02 


CH02COH | 


449 


21041 


RTA00002S9SF.k.0S. l.P.Seq 


F 


M00004372A:E12 


CHOiCOH 


450 


16484 


RTA00002S94F.C.04. l.P.Seq 


F 


M00003979B:A04 


CH0iCOH~| 
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SEQ 
ID 



451 
452 



453 



454 



455 



456 



457 



458 



459 



460 



461 
462 



466 



467 



468 
469 



471 



472 
473 
474 
475 



478 



479 



480 
481 



484 



485 



486 



487 



488 



489 
490 



491 



492 



493 



494 
495 



496 
497 



498 



499 



500 



CLUSTER 



SEQ NAME 



7741 
14921 



RTA00002S94F.i.08.1.P.Se 
RTA00002926F.e.l5.2.P.Seq 



1 757 1 RTA00002900F.m. 1 6. 1 .P.Seq 



46881 RTA0Q00290 IF. 1.20. l.P.Seq 



21533 
19010 



RTA00002398F.j.l0.1,P.Sc 
RTA000029 16F.k.08. 1. P.Seq 



48768 RTA00002836F.n.Ql. l.P.Seq 



75 1 5 RTA00002892F.p.22.2.P.Seq 



17326 RTA00002898F.h.Q2.1.P.Seq 



3902 RTA00002901F.d.l7.LP.Seq 



12400 
186543 



RTA00002901F.d.l8.1.P.Se< 
RTA000029 12F.a.Q6. l.P.Seq 



463 4063 I RTA00002916F.j.24.1. P.Seq 



464 6267 I RTA00002910F.d.20. l.P.Seq 



ORIENTATION 

F 
F 



CLONE ID 



LIBRARY 



M0000402SB:F10 j CHOICOH 



M000400S1C:E02 I CH09LNT 



_F 

F 



_M00005375D:A10 I CH02COH 
M00005622A:H02 | CH02COH 
M000Q4365C:C09 
M00032614D:D08 



CH01COH 



CH08LNH 



M00001374C:B10 | CH01COH 



M0Q003355C:FQ2 I CH01COH 



M00004350A:A04 I CH01COH 



M00005460D.-C 11 I CH02COH 



F 



M00005461A:D12 
M00027193C:A07 



CH02COH 



CH04MAL 



465 21349 I RTA00002901F.c.04.1-P,Seg 



1123 RTA00002S94F.i.24. l.P.Seq 



M00032613A:Ell CHOSLNH 



M00022335C:A09 CH03MAH 



M00005445D:F11 I CH02COH 



4401 RTA00002918F.m. 13. l.P.Seq 



15255 
10991 



RTA00002925F.p.l0.1.P.Se< 
RTAOOOQ2933F.a. 15. l.P.Seq 



M00004031C:G06 CHOICOH 



M00032979D:C11 CHOSLNH 



_F 

F 



470 48768 I RTA000028S6F.m.24, l.P.Seq 



M00040041A:G08 
M00043077C.G10 



CH09LNL 
CH19COP 



20406 



RTA00002900F.C.20. l.P.Sea 



M00001374C:B10 I CHOICOH 



39784 
36567 
14317 
156277 



RTA000O2S36F.g.Q5. l.P.Sa 
RTA00002886F.n.06. l.P.Sa 
RTA000Q29Q2F.a. 1 8. l.P.Sei 
RTA00Q029O7F.1.13.2.P.Seq 



M000Q4352D:C06 CH02COH 



_F 
_F 

F 



476 6898 I RTA00002907F.a.22. l.P.Seq 



M00001352B:B02 
M00001375B:D04 
MO0O05771D:CO2 
MO0O22237D:DO6 



477 17376 I RTA00002902F.C.03. l.P.Seq 



^F 
F 



186535 



91616 



91616 



6993 



12443 
28585 



9453 



156009 



5958 



155939 



16695 



10118 



13288 



3210 



15014 



22087 



31943 
11593 



3131 
151263 



177542 



973S 



15313 



RTA00002912F.d.l2.1.P.Seq 



RT AO00O29 1 OF.b.24. 1 .P.Seq 



RTA000O291OF.C.OL l.P.Seq 



CHOICOH 
CHOICOH 
CH02COH 
CK03MAH 
M00022104A:G08~[ CH03MAH 
M00Q05819D:F09 CHQ2C0H 
M00027270A:D04 CH04MAL 
M00022S12A:G01 CH03MAH 



RTA00002396F.j. 12. l.P.Seq 



M00022S12A:G0t CH03M.AH 



M00004172D:F04 CHOICOH 



RTA00002916F,a.20.1.P.Se( 
RTAOOOQ2901F,j. 16. l.P.Seq 
RT A00002907F.k.2 1 .2.P.Seq 



F_ 
F 



M00032534B:E12 
M00005570A:D05 



CHOSLNH 
CH02COH 



•RTA00002907F.k.05.2.P.Seq 



M0002222SB:B11 CH03M.AH 



RTA00002903F.n.22.2.P.Seq 



RTAOOOQ2907F.j.23.2.P.Seq 



M00022220A:A07 CH03M,\H 
M00022507C.C08 CH03MAH 



RTA00002SS6F.g.22. l.P.Seq 



M0002221SB:B12 T CH03M.\H 



RTAO0OO2SS6F.h. 18. l.P.Seq 



MOOOQ1353D:E05 I CHOICOH 



RTAOOOQ2930F.b.2 1.1 .P.Seq 



M00001356D:E06 f CHOICOH 



RTA00002910F.h.22. l.P.Seq 



M00042S91C.G03 CH15C0N 



RTA00002934F.a. 18. l.P.Seq 



M00022945A:H09 TCH03MAH 



RTA0000:S90F.i. 19. l.P.Seq 



M0004352SA:E11 I CH20COHLV 



RTA0000:908F.i.l2.1.P.Se< 
RTA00002906F.P.2 1 . 1 P.Seq 



M00001624A:C01 CHOICOH 



F_ 
F 



M00022454C.-BOS I CH03NLAH 



RTA0000290SF.m.l7.1.P.Sei 
RTA0000:906F.i.21. l.P.Seq 



M0002 2094 B . G02 I CH03M.AH 



_F 

F 



RTAO00029 10F.h.23. 1 P.Seq 



M00022494B:D06 
M00021 l >9lD:F09 



CH03NLAH 
CH03MAH 



'RTA00002924F.f.23. l.P.Seq 



M0002 294 5 B : F 1 1 \ C H03 M AH 



RTAOOOO:923F.f.Q5. l.P.Seq 



M00039559B:C07 I CH09LNL 



M00039S65 A:C09 I CH09LNL 



[1% 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBR.ARY 


501 


19724 


RTA00002906F.p.05. 1 .P.Seq 


F 


M00022088D:E10 


CH03VfAH 1 


502 


10731 


RTA00002S93F.m. 1 1. l.P.Seq 


F 


MO0O0393SC:A05 


CHOICOH 


503 


10257 


RTA00002901F.h.09. 1 .P.Seq 


F 


M00005512B:H01 


CH0°COH 


504 


186468 


RTA000029l3F.b. IS. 1 .P.Seq 


F 


M00027746A:D06 


CH04MA1 


505 


14736 


RTA0000290SF.g.22. 1 .P.Seq 


F 


M0002M36CF1 1 


CH03MAH 


506 


33267 


RTA00002889F.h. 14.1. P.Seq 


F 


M00001548B:D06 


CHOICOH \ 


507 


7719 


RTA00002908F.e. 1 1 .1 .P.Seq 


F 


M00022403C:E12 


CH03\LAH 1 


508 


185539 


RTA000029 1 3F.b.03. 1 .P.Seq 


F 


M00027717CC06 


CH04MAL 

* * » ft T ft ft 1M I 


509 


14825 


RTA00002924F.f. 1 9. 1 .P.Seq 


F 


M00039556C:G05 


CH09LNL 


510 


3917 


RTA00002906F.p. 15.1 .P.Seq 


F 


M00022092D:All 


CH03MAH 


511 


18718 


RTA00002895F.h.05.1. P.Seq 


F 


M000O4085B:H02 


CHOICOH 


512 


186762 


RTA0000291SF.b.ll.l.P.Seq 


F 


M00032831A:C07 


CHOSLNH 


513 


2732 


RTA00002925F.i.07. l.P.Seq 


F 


M00039900B:G04 


CH09LNL 


514 


7684 


RTA00002924F.j. 17. 1 .P.Seq 


F 


M00039668C:F0l 


CH09LNL 1 


. 515 


6852 


RTA00002922F.C.22. l.P.Seq 


F 


M00039001A:B10 


CH09LNL 1 


516 


1422 


RTA00002924F.C.09. 1 .P.Seq 


. F 


M00039433C:E03 


CH09LNL 1 


517 


5560 


RT A000029 1 7F.i.0 1 . 1 .P.Seq 


F 


M00032734C:CO5 


CHOSLNH 

^•a^t ft ft ft » ft ft ■ 


518 


48734 


RTA00002908F.1.23. 1 .P.Seq 


F 


M00022487C:C02 


CH03MAH 

ft a ft » ft ft ft_ft> ■ 


519 


104S6 


RT A00002S99F.g.07. l.P.Seq 


F 


M00004509B:B10 


CHOICOH 


520 


33514 


RTA00002S90F.j.03. l.P.Seq 


F 


M00001626A:D07 


CHOICOH 


521 


5821 


RTA00002917F.m.01. l.P.Seq 


F 


M00032772D:D03 

* T ft ^pr ^ppi r 9 »p» • M^^T ^pT 


CHOSLNH 

^1 ft ft A. ^ ft ft J 


522 


5821 


RT A000029 1 7F.1.24. 1 .P.Seq 


F 


M00032772D:D03 


CHOSLNH 1 


523 


21940 


RTA00002396F.a.03. 1 .P.Seq 


F 


M00004134A:A08 

4- ft ^tw ' ft. 4pjr * t ft 4 ft ft ^pr 


CHOICOH 


524 


185724 


RTA000029 12F.m.08. 1 .P.Seq 


F 


M00027523A:H05 

A r ft %pf r *pr 4 ft * * 4 X* ft^ 


CH04M.AL 

ft ft T ft W ft A K_ft«^ ■ 


525 


182887 


RT A00002910F.k.2 1 . 1 .P.Seq 


F 


M00022992A.H06 


CH03MAH 1 


526 


21346 


RT A0000290 1 F.g.24. l.P.Seq 


F 


M00005507B: A03 


CH02COH 1 


527 


5501 


RTA00002887F.n. 12. 1 .P.Seq 


F 


M000014^4B:H06 

ft * * Vp» Npf V* %pf A ■ »d ■ * ^ * ft ft ■•■r 


CHOICOH 

^hdt ft ft ft ^^ar ^bar ft ■ 


52S 


13961 


RTA00002S92F.j. 14. 1 .P.Seq 


F 


MOOOOSS^SAiDi 1 


CHOICOH 1 


529 


16784 


RTA00002S86F.a.09. l.P.Seq 


F 


M0000133SD:DOl 


CHOICOH 


530 










_ f i I H 1 


531 


3304 


RT A00002S9SF.d.05. 1 .P.Seq 


F 


M00004324A:B03 


CHOICOH 1 


532 


14895 


RTA00002901F.g.l4.1. P.Seq 


F 


M00005504C:F12 


CH02COH 1 


533 


16036 


RT A0000289 1 F.k.09. 1 P.Seq 


F 


M00003763B:B10 

* ^ » ^pr ^pT >pr ^p» «r * ^» ■ ■ *pp" » 


CHOICOH") 


534 


23877 


RTA00002S91F.k. 15. l.P.Seq 


F 


M00003764B:Fll 


CHOICOH 1 


535 


186784 


RTA00002930F.L 17. 1 .P.Seq 


F 


M00056105A:D06 


CH15CON) 


536 


13591 


RTA00002901Fi. 15. l.P.Seq 


F 


M000054S5C:H04 


CH02COH 1 


537 


17916 


RTA00002906F.p.08. 1 .P.Seq 


F 


M00022090B: A 10 


CH03MAHI 


538 


40594 


RTA0000Z897F.i. 15. 1 .P.Seq 


F 


M00004266B:F07 


choicoh"] 


539 


9677 


RTA00002925F.i.2 1 . 1 .P.Seq 


F 


M00039915B:E0S 


CH09LNL 


540 


7736 


RTA00002887F.e.03. 1 .P.Seq 


F 


M00001393B:C03 


CHOICOH 


541 


2474 


RTA00002917F.e. 15. l.P.Seq 


F 


M00032707D:F0S 


CHOSLNH 


542 


23S10 


RTA00002S92F.i.06. l.P.Seq 


F 


- M00003822D:A02 


CHOICOH 1 


543 


24633 


RTA00002907F.i. 19. 1 .P.Seq 


F 


M00022208B:D03 


CH03MAH 


544 


72081 


RTA00002925F.k.03. 1 .P.Seq 


F 


M000?9929D:H10 


CH09LNL 


545 


5991 


RTA00002916F.L 17. l.P.Seq 


F 


M00032597A.-H02 


CHOSLNH 


546 


14596 


RTA00002911F.n. 15. l.P.Seq 


F 


M00027131A:B03 


CH04M.AL | 


547 


6923 


RTA0000:S96F.d.01 . 1 .P.Seq 


F 


M00004146B:E08 


CHOICOH j 


548 


6923 


RTA0000:S96F.c.24. 1 .P.Seq 


F 


M00004146B:E08 


CHOICOH 


549 


21S51 


RTA00002SS7F.A09. 1 -P.Seq 


F 


M00001391D:D03 


CHOICOH 


550 


3935 


RTA00002925F.j.08. l.P.Seq 


F 


M00039921CH11 


CH09LNL 1 



119 

t 
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ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


55 L 


13328 


RTA00002909F.H.OS . 1 . P.Sea 


F 


M00022634B:H09 


CH03MAH 


552 


2492 


RTA00002SS7F.e. 1 1 . 1 .P.Seq 


F 


M00O01393D:EO2 


CH01COH 


553 


11960 


RTA000029 1 7F. b. 03. 1. P.Seq 


F 


M00032671B:DOS 


CH08LNH 


554 


186084 


RTA000029 12F.I*. 18. 1 .P.Seq 


F 


M00027319D:F07 


CH04MAL 


555 


13644 


RTA0O002925F.a.O9. 1 .P.Seq 


F 


MO0O398O5B:BO6 


CH09LNL 


556 


5707 


RTA00002909F.k. 13.1. P.Seq 


F 


M00022672C.H04 


CH03MAH 


557 


95700 


RTA0000291 IF.p. 14. 1. P.Seq 


F 


M00027182B:G06 


CH04MAL 


553 


342 


RTA00002922F.L23. l.P.Seq 


F 


M00039076D:G04 


CH09LNL 


559 


8481 


RTAOO0O2S87F.C. 12.1. P.Seq 


F 


M00001389D:D06 


CHOICOH 


560 


12575 


RTA000029l6F.i. 12.1. P.Seq 


F 


M00032594C:F05 


CH08LNH 


561 


40712 


RTA0000292lF.d.08. l.P.Seq 


F 


M00033359C:H05 


CH09LNL 


562 


10768 


RTA00002886F.d.24. 1. P.Sea 


F 


M00001346B:Gll 


CHOICOH 


563 


38781 


RTA00002SS9F.k.23. l.P.Seq 


F 


M00001559A:H09 


CHOICOH 


564 


8790 


RTA00002SSSF.2.08. LP.Seq 


F 


M00001461D:C10 


CHOICOH 


565 


10167 


RTA000029 l6F.k.22; l.P.Seq 


F 


M00032621A.FH 


CH08LNH 


566 


13706 


RTA00002905F.e.2 1. l.P.Seq 


F 


M0000S019B:A01 


CH03MAH 


567 


124172 


RTA00002900F.a.09. 1 .P.Seq 


F 


M00004824A:D12 


CH02COH 


568 


92126 


RTA000029 lOF.g. 12. 1 .P.Seq 


F 


M00022904C:D04 


CH03MAH 


569 


5830 


RTA000029 16F.J.09. 1 .P.Seq 


F 


M00032605B:D09 


CH08LNH 


570 


15154 


RTA000028S6F.p. 13. LP.Seq 


F 


M00001382D:A07 


CHOICOH 


571 


25813 


RTA000029lOF.i. 12. LP.Seq 


F 


M00022952A:B02 


CH03MAH 


572 


17268 


RTA00002SS6F.d.07. LP.Seq 


F 


M00001344D:E08 


CHOICOH 


573 


13684 


RTA000029l5F.j.09. LP.Seq 


F 


M00031485B:G05 


CHOSLNH 


574 


13460 


RTA00002 89 S F. t . 1 9 . 1. P . Se q 


F 


M00004341C:A09 


CHOICOH 


575 


25115 


RTA000029l9F.p. IS. LP.Seq 


F 


M00033311B:G10 


CH08LNH 


576 


19949 


RTA00002905F.e. 17. LP.Seq 


F 


M00008016B:E09 


CH03MAH 


577 


24266 


RTA000029l7F.k.06. l.P.Seq 


F 


M00032759A:A03 


CHOSLNH 


578 


8243 


RTA00002901F.0. 17. LP.Seq 


F 


M00005703B:E03 


CH02COH 


579 


12576 


RTA00002900F.k.23. LP.Seq 


F 


M0O0O5359B:BO8 


CH02COH 


580 


28531 


RTA00002909F.C.04. LP.Seq 


F 


M00022559D:GIO 


CH03MAH 


581 


15153 


RTA00002S94F.O.2 1. l.P.Seq 


F 


M00004054A:D03 


CHOICOH 


582 


9498 


RTA00002894F.e.04. 1 .P.Seq 


F 


M000039S5D:B02 


CHOICOH 


583 


48140 


RTA000029l4F.h. 13. LP.Seq 


F 


M0002821 1A:F10 


CHOSLNH 


584 


7626 


RTA00002895F.b.04. 1 .P.Seq 


F 


M00004061B:E05 


CHOICOH 


585 


22668 


RTA00002896F.p. 17. LP.Seq 
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1465 


1 6303 


RTA00002888F.b. 12. 1 .P.Seq 


F 


M0000143S A:E0l 

4 4- W W XV X^ ft * X^ • ft * X^ ft 


CH01COH 

>^ ft ft XV 4 ^v * * 


1466 


16828 


RTA00002897F.b.04. l.P.Seq 


F 


M00004214A:E05 


CH01COH 

^ta^ 4 A ^ ft ft 


1467 


14295 


RTA0000292 IF.a. 18. 1 .P.Seq 


F 


M00033296CC1 1 

ft v ft XV XV XV ^ ft^ >V XV ^ _ » X/ ft 


CH09LNL 

X_r ft ft XV V kw * 1 ft-^ 


1468 


1979 

* v r v 


RTA00002930F.f.06.1.P Seq 


F 


M000557^^DD09 

1T4VW\/^ V ' w UV ■ ft.*' XV 


CH15CON 

Xw- X ft X «V \w ^.V 1 ft ' 


1469 


36248 


RTA00002888F.2.05. 1 .P Seq 

a m A * ft- X* X* X* X* ^» X^ XV X* * • » Xrf »^ < k » ft v s^r X» X*i 


F 


M00001460CE10 

» • * xV XV xv Vv 4 V w * Aw I xv 


CH01COH 

Xur A. ft XV 1 x^v A A 


1470 


5676 

»v xv t xv 


RTA00002926F b 22 2 P Seq 

1\ & 4 ft XV ^V XV XV 4 » XV » ^» » *mw * 4 » ^V Xj^ XjJ 


F 


M0004007 ^ B - A05 

ft* lWv~x/\y r ipf . 4 ft \V *V 


CH09LNL 


1471 

* ■ * ft 


P39 

X w *J J 


RTA00002887F o n I 1 P Sea 

£ ^ A- * * *■* XV XV XV \_i ^V V A * XV * ft i ft * ft » *J Xv U 


F 


M000014°SBC 10 




1472 


7937 


RTA00002917F » 22 1 P Sea 


F 


M0003 0 T^SD'FOl 


CH08LNH 


1473 


4483 


RTA0000291 lF.d.22.2.P.Seq 


F 


M000°6856BG03 

ft * • XV xv W XV XV xv k^V » x^ W 


CH04M.AL 


1474 


7796 


RTA00002925F.C.05. 1 .P.Seq 


F 


M00039826BF09 

ft w ft XV XV w V W XV 1-V » ft XV 


CH09LNL 

M. ft, XV -V ft^ L 1 ft_^ 


1475 


17330 


RTA000029 l5F.a.03. 1 .P.Seq 


F 


M000' 7 8616CD09 

ft T ft XV XV XV mrmr %J XV ft X^ X/ » ftj ^ XV *V 


CH08LNH 

X_* A ft XV XV ftw ^ ^ ft ft. 


1476 


25620 


RTA00002902F.f.09. l.P.Seq 


F 


M000066" IC A04 

ft » A XV XV XV XV W XV v ft V * * k. XV » 


CHOICOH 

Xta^. A fti XV X_p> X^V 4. ft 


1477 


20601 


RTA00002923F.1.20. l.P.Seq 


F 


M000393 n 6 AG07 

ft * ft XV W XV hV -V «V Xr * k ■ X^J XV t 


CH09LNL 

Xmt ft, ft,XV ^ ft>^ ft " ft^ 


1478 


6205 

*V *m W«V 


RTA00002923F.o.21.l P Sea 


F 


M00039°5SC-C01 

*~ 1 xv xv v/ ^/ >^ w w> V*- ■ x^ xV ± 


CH09LiNL 

X_ r 1, AXV V Lkrfft ' 1^ 


1479 


726 


RTA000029 1 3F.b. 1 6. 1 .P.Seq 


F 


M000°773-D*C0" 

ft * ft xv xf xy «v r ' %j - ft^v > x/ xv ^ 


CH04MAL 

x_v ft ft xv r 4 ' ft ^ * ^ 


1480 


104999 

■A- W * .V X V 7 


RTAOOOO^OSF.o 17.1 P Seq 


F 

* 


M000* , "'4"^3-GP 

ft* t WW** * ■ v i_/ * w ft ^ 


CH03MAH 

A 1UJ 1' * J l~ft ftr 


1481 


303^1 

*v V-/ *V » 4 


RTA0000' ) 919Fo.l7 1 P Sea 


F 


M0003 "P64B-E06 


CH08LNH 

Xj- A. X\/xV ftw 4 ~A A 


1482 


5878 

XV » XV 


RTA00002913F.a.l6.1 P Seq 


F 


M000 n 76SSCC01 

ft* 1 VVXV VV «— 1 ' XV X_, V- » Vw XV A 


CH04MAL 

Xw ft A XV T ft ~ L J liil 


1483 


5944 


RTA000O2905F.m.07. 1 .P.Seq 


F 


M000^164OB- A0° 

ft * ft XV XV XV L XV * r « * ft XV te« 


CH03MAH 

ft X XV ^V ft * ft ft fc ft 


1484 


5796 


RTA00002908F.i.2 1. l.P.Seq 


F 


M000°°4" A G05 

ft r ft XV XV V +*P ' -m* * k * X«*J XV *V 


CH03MAH 

ft ft XV v 1 * ft ~ ft * kvft A 


1485 


3804 


RTA00002935F.m.24. l.P.Seq 


F 


M00055254A:H03 

ft r ft X* Vrf XV *^ *^ * k > A ft XV 


CH17COHLV 


I486 


2728 


RTA000029 1 8F.a.22. 1 .P.Seq 


F 


M0003282S A A06 

ft w ft X^ Xr XV ^V X^ " * ft ^ 4 ft X^ X^ 


CH08LNH 

ft. ft X^ Xrf ftwift ' A & 


1487 


3804 


RTA00002935F.n.0 1 . 1 . P.Seq 


F 


M000552^4 A H03 

ft ~ ft XV XV XV ^ ^v ^ » » *■ ft XV \J 


CH17COHLV 

Xrf" ft ft ft p X^/ k 1, k^ T 


1488 


393^ 

V V «V ^B> 


RTA00002915F.O.19.2.P Seq 


F 


M0003 n 5 l"C E 10 

ft ~ ft XV XV ^V *^ ^> w* ft * ft XV 


CH08LNH 

\b> ft ftx/V &^ ^ ^ ft. ft 


1489 


16691 


RTA00002S91F.O.03.1.P Seq 


F 


M0000 vSC A-G01 

ft * ft XV XV XV XV * • XV k w V * xv » 


CHOICOH 

ft ft XV ft ft *■ 


1490 


15430 


RTA00002900F.a. 10. 1 P Seq 


F 


M0000^00 ; DC0° 

ft T ft XV XV XV XV ^- XV XV ft^V * X_ir XV 


CHOICOH 

X^rf ft ft^ XV *w X^r X_^ ft ft 


1491 


5637 

^ xv ^ * 


RTA00002925F b IS 1 P Seq 


F 

ft. 


MO0O"i9S n 0BF06 

1 ' 1 XV V XV ■mm 1 W * \ - A XV XV 


CH09LNL 

X^^ ft ft XV ^ * -■ ft ' 


1492 


16633 


RTA00002S97F.<' 15.1 P Seq 


F 


M00004 n 4c3H07 

ft * ft xv xV xv xv ^ ■ x* . ft ft XV ' 


CHOICOH 

ft ft XV ft ft ft 


1493 


21826 


RTA00002898F.2.06. l.P.Seq 


F 


M0000434-A:G1 1 


CHOICOH 


1494 


22193 


RTA00002919F.i.09. l.P.Seq 


F 


M00033l4cD:A03 


CH08LNH 


1495 


10720 


RTA00002S9SF.C. 14. l.P.Seq 


F 


M0000432CC:E07 


CHOICOH 


1496 


22491 


RTA00002925F.rn.06. 1 .P.Seq 


F 


M00040003A:G10 


CH09LNL 


1497 


10423 


RTA000029l5F.n.l3.2. P.Seq 


F 


M0003250"D:G08 


CH0SLNH 


1498 


4953 


RTA000029 l6F.h. 1 1. l.P.Seq 


F 


M000325ScC:B04 


CH08LNH % 


1499 


185567 


RTA000029 1 IF.p.OS. l.P.Seq 


F 


M00027l7S3:All 


CH04M.AL 


1500 


25605 


RTA00002924F.m.22. l.P.Seq 


F 


M000397lC3:A0l 


CH09LNL 
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ID 


CLUSTER 


SEQ NAME 


ORIENTATION 


CLONE ID 


LIBRARY 


1501 


29446 


RTA00002906F.m.24. l.P.Seq 


F 


M 


552 


22070B:B04 


CH03MAH 


1M)2 


9668 


RT A000O2908F.g.02. 1 .P.Seq 


F 


M 


•52 


22421A:F12 


CH03MAH 


1503 


29446 


RTA00002906F.n.01 . 1 .P.Seq 


F 


Mi 


•52 


22070B:B04 


CH03MAH 


1504 


7171 


RTA00002887Fm.22. 1 .P.Seq 


F 


Ml 


533 


D1421B:E07 


CH01COH 
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Table 3t' 





1 Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlasLX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


f DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


p V \x r rp 


I 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


v.. 1 * \Jv* C-> 


<lHKJV9 tl> 


-> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 






j 


<NONE> 


<NONE> 


<NONE> 


<NONE> 




<IN kJiNJfc> 


4 <NONE> 


. <NONE> 


<NONE> 


<NONE> 


un 


<J.N IN tl> 


5 j <NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONF> 




6 <NONE> 


<NONE> 


<NONE> 


<NONE> 






7 <NONE> 


<NONE> 


<NONE> B <NONE> 


<\*ONE> 




8 <NONE> 


<NONE> 


<NONE> \ <NONE> 


<NONE> 




9 j <NONE> 


<NONE> 


<NONR> 


I <NONE> 


<N*ONE;> 




10 


<NONE> 


<NONE> 


<NONE> 


<tNONE> 


<NONE> 




11 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 




12 <NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 




13 <NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


L4 <NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


15 <NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


16 j <NONE> 


<NONE> 


<NONE> 


j <NONE> 


<NONE> 


<NON*E> 


17 J <NONE> 


<NONE> 


<NONE> 1 <NONE> 


<N'ONE> 


<NONE>| 


IS J <NONE> 


<NONE> 


<NONE> <NONE> 






19 


<NONE> 


<NONE> 


<NONE> <NONE> 


<NONE> 


<N0NE>| 


20 


1 <NONE> 


<NONE> 


<NONE> <NONE> 


<NONE> " 


<N0NE> 


21 <NONE> 


<NONE> 


<NONE> <NONE> 


<NONE> 


<N0NE> 


22 j <NONE> 


<NONE> 


<NONE> | <NONE> 


<NONE> 


<NONE> 




<NONE> 


<NONE> 


<N*ONE> 


54S562 


GENOME POLVPKOTELN 

[contains: rn a 
replicase ■ helicase- 
coat protein] 2.7.7.48) - 

apple stem grooving virus 


j 

9 2 1 


24 


<NONE> 


<NONE> 


<NONE> | 


416959 


EXCISION REPAIR PROTELN 
ERCC-6 DNA repair helicase 
ERCC6- human >gi|1821Sl 
(L04791) excision repair protein 
[Homo sapiens] 


8.9 


25 


<NONE> 


<NONE> 


1 

<NONE> I 3327096 


(ABO 14541) KIAA0641 protein 
[Homo sapiens] 




8 " 


26 


<NONE> 


<NONE> 


<NONE> 


861293 


^8741) F35D"' 1 *ene 

nrnduet rCa^nnrh'ihdifi*; 

eleaans] 


7.9 


27 


<NONE> 


<NONE> 


<NONE> 


3297821 


(AL031O32) extensiivlike 
3 rote in 


5.5 


23 


<NONE> 


<NONE> 


<NONE> 


I 
l 

2119692 i 


transforming arowth factor- beta 

type III receptor - chicken 

>gi|5HS43(L0112l) 

Crans forming growth factor- beta 

ype III receptor [Galium sallusj 


5.1 


29 


<NONE> 


<NONE> 


<NONE> 


213602S | 


orotein kinase PRKl - human 


5.0 

















Ho 
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Nearest Neighbor (BlastN vs. GenbanJO 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins! 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















30 


<NONE> 


<NONE> 


<NONE> 


2746912 


(AFO40659) No definition line 
found (Caenorhabditis elegans) 


4.6 


31 


<NONE> 


<NONE> 


<NONE> 


2358287 


(AF010404) ALR [Homo 
sapiens) 


4.5 


32 


<NONE> 


<NONE> 


<NONE> 


3877816 


(Z96048) predicted using . 
Genefinder; cDNA EST 
EMBL.D655 16 comes from this 
gene; cDNA EST ykl9la5.5 
comes from this gene 
[Caenorhabditis elegans] 


4.4 


33 


<NONE> 


<NONE> 


<NONE> 


4140268 


(Y14953) SRCR domain, 
membrane form 2 


4.1 


34 


<NONE> 


<NONE> 


<NONE> 


1708663 


(U51 183) transposase [Hydra 
vulgaris] 


4.0 




<NONE> 


<NONE> 


<NONE> 


1184100 


(U45958) pistil extensin-like 
protein [Nicotiana atata) 


3.9 


36 


<NONE> 


<NONE> 


<NONE> 


121073 


GLUCOCORTICOID 
RECEPTOR (GR) 


3.9 


37 


<NONE> 


<NONE> 


<NONE> 


1718298 


(U75698) ORF 45; contains an 
extended acidic domain; EBV 
BKRF4 homo log [Kaposi's 
sarcoma- associated herpesvirus] 
homology conserved in other 
gamma-herpesviruses 


2.6 


38 


<NONE> 


<NONE> 


<NONE> 


2352538 


(AF006564) alcohol 
dehydrogenase [Drosophila 
perst milis] persimilis] 


1.4 


39 


<NONE> 


<NONE> 


<NONE> 


3192897 


[Dictyostelium discoideum] 


1.4 


40 


<NONE> 


<NONE> 


<NONE> 


* 

561645 


(L33421) This CDS feature is 
included to show the translation 
of the corresponding V_region. 
Presently translation qualifiers 
on V_resion features are illegal 


1.0 


41 


<NONE> 


<NONE> 


<NONE> 


3878S57 


(ZSJiid; predicted using 
Genefinder; cDNA EST 
EMBL:D35016 comes from this 
gene; cDNA EST 
EMBL.D32583 comes from this 
gene; cDNA EST 
EMBL:D35258 comes from this 
gene;cDNA EST 
EMBL:Cl 1471 comes from this 
gene; cDNA EST EMBL:C... 


1.0 



If I 
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Nearest Neighbor (BlastN vs. Genbank) I Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



SEQ 
ID 


I ACCESSION 


DESCRIPTION 


P VALUE 


[ ACCESSION 


DESCRIPTION 


P VALUE 

J - 










(U75903) UGT1A7 [Rattus 




42 


1 <NONE> 


<NONE> 


<NONE> 


1658571 


norvegicus] 


1.0 


43 


<NONE> 


<NONE> 


<NONE> 


2338034 


(AF0O537O) putative immediate 
early protein [Alcelaphine 
herpesvirus I] 


0.86 


44 


<NONE> 


<NONE> 


<NONE> 


3043714 


(AB01 1 167) K1AA0595 protein 
[Homo sapiens] 


0.42 


* 

45 


<NONE> 


<NONE> 


<NONE> 


1723710 


HYPOTHETICAL VLI KJJ 
PROTEIN IN ASN2-PHB1 
INTERGENIC REGION 
>gi(213l678|pir||S64439 
hypothetical protein YGR1 30c • 
yeast (Saccharomyces 
cerevisiae) 

>gi|1323215|gnl|PID|e243523 
(272915) ORF YGR130c 
[Saccharomvces cerevisiae] 


0.40 


46 


<NONE> 


<NONE> 


<NONE> 


1723710 


HYFOTHEl 1LAL ^Ll iUJ 
PROTEIN IN ASN2-PHB 1 
INTERGENIC REGION 
>gi|2l3l678|pirj|S64439 
hypothetical protein YGR130c - 
yeast (Saccharomyces 
cerevisiae) 

>gi|13232 l5|gnl|PIDIe243523 
(Z72915) ORF YGRl30c 
[Saccharomyces cerevisiae] 


0.38 


47 


<NONE> 


<NONE> 


<NONE> 


2996117 


(AF046125) immediate early 2 
[Rat cytomegalovirus) 


0.26 


48 


<NONE> 


<NONE> 


<NONE> 


4151809 


(AF102855) synaptic SAPAP- 
interactins protein Svnamon 


0.024 


49 


<NONE> 


<NONE> 


<NONE> 


2773341 


(AF040954) putative protein 
phosphatase 1 nuclear targeting 
subunit [Rattus norve?icus] 


0.017 


50 


<NONE> 


<NONE> 


<NONE> 


1653522 


(D90914) hypothetical protein 


3e-04 


51 


<NONE> 


<NONE> 


<NONE> 


3219965 


HYPOTHETICAL 100.6 KD 
TRP-ASP REPEATS 
CONTAINING PROTEIN 
C2C6.04C IN CHROMOSOME 
I 


3e-06 


52 


<NONE> 


<NONE> 


<NONE> | 


4185567 


(AF1 15480) eAMP-dependent 
Rapl guanine-nucleotide 
exchange factor [Mus musculus] 


7e-07 l 
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Nearest Neighbor (BlasiN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


tD 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












HYWTHETICAL 4 j.2 KD 




53 


<NONE> 


<NONE> 


<NONE> 


1176527 


PROTEIN C34E10.1 IN 
CHROMOSOME III 
>gi|500724 (U 10402) C34E10.1 
gene product (Caenorhabditis 
eleeans) 


3e-20 


54 


> 

XS5444 


G. pallida repetitive 
DNA element 


5.0 


2118936 


beta-globin - chimpanzee 
(fragment) 


8.6 


55 


X7296 1 


Synechococcus sp. 
cpeB, cpeA genes anc 
ORF3 


5.0 


462569 


MICROTUBULE- 
ASSOCIATED PROTEIN 1A 
microtubule-associated protein 
MAPI A - rat >gi|205538 
norveeicus] 


2.2 


56 


U94747 


Human WD repeat 
protein HANI 1 
mRNA. complete cds 


5.0 


3875538 


(Z67990) similar to cuticle 
collagen 


1.3 


57 


AF032108 


Homo sapiens 
integrin alpha-7 
mRNA. complete cds 


5.0 


2147194 


* 

collaeen - Paxalvinella erasslei 


0.002 


58 


250798 


G.gallus mRNA tor 
p52 


5.0 


3122885 


ASP ART YL -TRNA 
SYNTHETASE synthetase 
[Bacillus subtilis] 


3e-ll 


59 


AB002384 


Human mRNA tor 
KIAA0386 gene, 
complete cds 


5.0 - 


2632098 


(Y 155 13) Prodos protein 
[DrosoDhila melanoeaster] 


9e-12 






Therrnofilum pendens 










60 


X14835 


DNA for 16S and 
23S ribosomal RNA» 
tRNA-Met, and tRNA 
Gly 


4.9 


<NONE> 


<NONE> 


<NONE> 


61 


U87J49 


Hordeum vulgare 
nucellin gene, 
complete cds 


4.9 


128578 


NONSTRUCTURAL 
PROTEIN NS-S spotted wilt 
virus (strain CPNH1) non- 
structural protein [Tomato 
spotted wilt virus] 


2.8 


62 


DS7541 


Mas muse ul us gene 
for integrin alpha v 
subunit. promoter 
reaion 


4.9 


136956 


HYPOTHETICAL PROTEIN 
JL61 cytomegalovirus (strain 
AD 1 69) cytomegalovirus] 


0.038 


63 


U72520 


VI us musculus mcna 
protein (Mena) 
mRNA. complete cds 


4.9 


3413892 1 


(AB007934) KIAA0465 protein 
[Homo sapiens] 


6e-07 
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Nearest Neighbor (BlasiN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


— , 














D4 


S79797 


enzymatic 
glycosylation- 
regulating gene [rats. 
Sprague-Dawley, 
streptozotocin 
diabetic, heart, 
mRNA. 5010 nt] 


4.8 


<NONE> 


<NONE> 


<NONE> 




AB01U02 


Homo sapiens mRNA 
for KIAA053O 
protein, partial cds 


4.8 


t 

138022 


RECEPTOR RECOGNIZING 
PROTEIN gp38 - phage 0x2 
>gi|15126 (X05675) gene 38 
(AA 1-266); pid:gl5126 
[Bacteriophage 6x2] 


3.6 


66 


AF100985 


Penaeus monodon 
phosphopyruvate 
hydratase mRNA, 
complete cds 


. — r 

4.8 


500615 


(D16221) endochitinase [Oryza 
sativa] 


2.8 


67 


U31756 


Bacillus subtiiis 
gamma- 
aminobutyrate 
permease cds 


4.8 3880699 


(AL021471) similar to 
Eukaryotic aspartyl proteases 
[Caenorhabditis elegans] 
Eukaryotic aspartyl proteases 
[Caenorhabditis elegans] 


2.8 


68 


U25I1I 


Pisum sativum 
chloroplast 
processing enzyme 
mRNA. nuclear gene 
encoding chloroplast 
protein, complete cds. 


4.8 1800145 


(U83658) FH1/FH2 protein 
lomolog (Emericella nidulans] 


1.6 


69 [ 


U00454 


Vlus musculus Cdx-2 
homeobox protein 
gene, complete cds. 


4.7 J <NONE> 


<NONE> 


<NONE> 


70 


MS4166 


rlamster c-Ha-ras 
protein gene, 
complete cds. 


4.7 | 1710606 


RENIN- BINDING PROTEIN 
(RNBP) protein [Rattus 
norveeicus] 


0.88 


71 


• 

AF087516 


VI us musculus major 
sperm fibrous sheath 
protein rro- 
mAKAP82 gene, 
alternative splice 
exons V and 1" 


4.6 


<NONE> 


<NONE> 


<NONE> 


72 I 


X74160 


vt.esculenta mRNA 
br granule-bound 
starch synthase 


4.6 


<NONE> 


<NONE> 


<NONE> 


73 J M97487 


■laloferax volcanii 
superoxide dismutase 
[sod 2) gene, complete 
cds. 


4.6 


2623307 


(AC002409) putative ubiquitin 
protease [Arabidopsis thaliana] 


3.4 j 



|Cflf 
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' -7* ■ - 


Nearest Neighbor (BlastN vs. Gen bank) 


Nearest Neigh b 


or (BlastX vs. Non-Redundant Proteins) 


SEQ 
ED 


ACCESSfON 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Drosophila 










74 


M57889 


melanogaster 
suppressor of sable 
gene, complete cds. 


4.5 


<NONE> 


<NONE> 


<NONE> 


75 


D49708 


Rattus norvegicus 
mRNA for RNA 
binding protein 


4.5 


<NONE> 


<NONE> 


<NONE> 


76 


D31853 


Yeast GTS 1 gene for 
glycin-threonin/serine 
repeat protein, 
complete cds 


4.5 


. 2447195 


(U42580) NETTF (7x), DETTS 
(4x) [Paramecium bursaria 
Chlorella virus 1] 


3.3 


77 


247036 


Human partial cDNA 
sequence, clone 
bs613; 


2.9 


<NONE> 


<NONE> 


<NONE> 


78 


L 19660 


Rattus norvegicus 
gastric inhibitory 
neotide recentor 
mRNA, complete cds 


2.7 


2358279 


(AF007871) torsinA fHomo 
sapiens] 


2e-07 


79 


X8284I 


A.thaltana Aco aene 


2.6 


483212 


immediate-early protein El 10 - 
human herpesvirus I (strain 
HFEM) (fragment) 


8.4 


80 


X61931 


S.purpurascens famA 
and famB genes for 
FAS domain and acyl- 
CoA-dehydroaenases, 
respectively 


2.6 


2290534 


(U95031) sublingual gland 
mucin [Homo sapiens] 


* 

— 0r47 — 


81 


U 1 3680 


Human lactate 
dehydrogenase-C 
(LDH-C) mRNA, 
complete cds. 


2.5 


■ 2887449 


(AB007874) KIAA0414 [Homo 
sapiens] 


3.1 


82 


AB007869 


Homo sapiens 
KIAA0409 mRNA, 
partial cds 


2.4 


3130157 


(AB008859) pheromone 
receptor fFugu rubripes] 


5.4 


83 


X97479 


H. sapiens mas proto- 
oncogene. 5' region 


2.1 


<NONE> 


<NONE> 


<NONE> 


84 


X98374 


R.norvegicus mRNA 
for KIS protein 


1.9 


<NONE> 


<NONE> 


<NONE> 


85 


AE0007IO 


Aquifex aeolicus 
section 42 of 109 of 
the complete genome 


L.9 


<NONE> 


<NONE> 


<NONE> 



WO 01/02568 



PCT/USOO/18374 



i 



' ~ \* 


Nearest Neighbor (BlasiN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins) I 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


[ ACCESSION 


DESCRIPTION 


p value| 






Homo sapiens mRNA 










86 


D30612 


for reDressor protein, 
partial cds 


1.9 


<NONE> 


<NONE> 


<NONE> 


87 


Y14321 


Homo sapiens 
PMP69 gene, exons 
8 9 10 & 11 


1.9 


<NONE> 


<NONE> 


<NONE> 


88 


D90773 


E.coli genomic DNA, 
Kohara clone 
#262(30.3-30.5 min.) 


1.9 


• 1536816 


(D78305) DNA binding protein 
[Chlorella virus] 


7.9 


89 


AE000991 


Archaeoglobus 
fulgidus section 1 16 
of 172 of the 

\^ 1 A r at v& HIV 

complete genome 


1.9 


1 > 

520645 


(X79095) 

Dvruvate orthoohosohate 
dikinase [Flaveria trinervia] 


2.7 


90 


U39476 


Rattus norvegicus 
p95 Vav (Vav) proto- 
oncopene mRNA 
complete cds. 


1.9 


4158178 


(AL023496) hvoothetical 
protein 


1.6 


91 


LP8S38 


Human transcription 
factor TFIIIB 90 kDa 

QiiHiinii 

3UL/UI111 


1 9 


2495730 


HYPOTHETICAL PROLlNTS- 
RICH PROTEIN KIAA0269 
>gi| l665805|gnl|PID|d 10 14089 
(D87459) Similar to Volbox 
carteri extensin (S22697) 

fHnmn Qjinipncl 

1 1.1171 HIS b-l lo J 


0 23 

V- **<J 


J 

92 


U20106 


Rattus norvegicus 
synaptotagmin VII 
mRNA. complete cds. 


1.9 


478380 


UL47h protein - Marek's disease 
virus 


0.23 1 


93 


AF0710I0 


Mouse mammary 
tumor virus putative 
integrase, env 
polyprotein, and 
supcrantigen mRNA, 
comDlete cds 


1 9 


» 2781386 


(AC004010) similar to Leucine- 
rich transmembrane proteins; 
44% similarity to U42767 
(PID:gl736918) [Homo 
sapiens! 


4e-33 


94 


AF06188I 


Nfesocricetus auratus 
c-fos proto-oncogene 
protein (c-fos) gene, 
complete cds 


1.8 


<NONE> 


<NONE> 


<NONE> 


* 

95 


AE001397 


Plasmodium 
falciparum 
chromosome 2, 
section 34 of 73 of 
the complete 
sequence 


1.8 


<NONE> 


<NONE> 


<NONE> 



I 



I 
I 



I 
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Nearest Neighbor (BlasiN vs. Genbanle) 


Nearest Neighbor (BlastX vs. No n- Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Horseshoe crab 










96 


D14701 


mRNA for 

r~ r\ lirnfoti f\ rt f i tf*ir Tl 

COdgulallun luCiur D, 

complete cds 


1.8 


<NONE> 


<NONE> 


<NONE> 


97 


M29154 


P. falciparum 
multidrug resistance 
(MDR) gene, 
complete cds. 


1.3 


<NONE> 


<NONE> 


<NONE> 


98 


L16532 


Rntttic nnrvepiciis 
(clone pCNPII) 2*. 3'- 
cyclic nucleotide 3'- 
pnu5priuuicsier*isc 
(CNPIl) mRNA, 
complete cds. 


1.8 • 


<NONE> 


<NONE> 


<NONE> 


99 


AE001434 


PI acmnrli n rn 

falciparum 
chromosome 2, 
section 71 of 73 of 
the complete 
sequence 


1.3 


<NONE> 


<NONE> 


<NONE> 


100 


Z46785 


D.melanogaster gene 
for protamine 
(mst35Bb). 


1.8 


<NONE> 


<NONE> 


<NONE> 


101 


X69822 


-r.syivesins rnivivA. 
for glutamine 
synthetase 


L.S 


219896 


(D90452) l-caldesmon I [Homo 
sapiens) 


9.7 












INSULIN-LIKE GRUW 1 H 
FACTOR BINDING PROTEIN 




102 


U49055 


Rattus norvegicus 
CTD-binding SR-iike 
protein rA8 mRNA, 
complete cds 


1.8 


* 

2497252 


BINDING PROTEIN 4) factor- 
binding protein-4 - sheep 
(fragment) factor- binding 
protein-4, IGFBP-4 {sheep, 
liver, Peptide, 237 aa] [Ovis 
aries] 


2.5 


103 


L28101 


kallistatin (PI4) gene, 
exons 1-4, complete 
cds 


1.8 


4204267 


(AC005223) 55585 
[Arabidopsis thaliana] 


2.4 


104 


U66987 


Pandorina morum 
internal transcribed 
spacer 1. 5.8S 
ribosomal RNA gene, 
and internal 
transcribed spacer 2, 
complete sequence 


1.8 


2635909 


(Z99L21) permease (Bacillus 
subtilis] 


1.9 
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PCT/US00/18374 



Nearest Neighbor (BlastN vs. Gcnbank) I Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



SEQ 
ID 




DESCRIPTION 


P VALUE 


1 ACCESSION 


DESCRIPTION 


P VALUE 




Human polymorphic 


1 






105 


X58033 


Mspl site DNA 
(D3S3 locus) 


1.8 


2136878 


keratin KAP5.5 - sheep 
(fragment) >gi|313722 


0.65 


106 


U15780 


Human p82 (ST5) 
mRNA, alternatively 
spliced, complete cds 


1.8 


3638957 


(AC004877) sco-spondin- mucin- 
like; similar to P98167 uncertain 
[Homo sapiens] 


0.64 


107 


AF038535 


Homo sapiens 
synaptotagmin VII 
mRNA, partial cds 


1.8 


457927 


(U00690) calcium channel alpha 
1 subunit [Drosophila 
melanosaster] 


0.51 


108 


AF052134 


Homo sapiens clone 
23585 mRNA 
sequence 


1.8 


232263 


HOMEOBOX PROTEIN HOX- 
Dl (HOX-4.9) 


0.28 


109 


X75208 


H. sapiens HEK2 
mRNA for protein 
tyrosine kinase 
receptor. 


1.8 


1730198 


GROWTH- ARREST-SPECIFIC 
PROTEIN 1 gene product 
[Homo sapiens] 


0.22 1 


110 


ABO 13 896 


Xenopus laevis 
mRNA for SOX-D, 
complete cds 


1.8 


2494501 


TRANSCRIPTION FACTOR 
FKH-4 factor [Mus musculus] 


0.17 J 


111 


D 16947 


Human HepG2 3' 
region cDNA, clone 
hmd6bl0 


1.8 


1 3413870 


(AB007923) KI A A0454 protein 
[Homo sapiens] 


0.002 


112 


D 13547 


Mouse DNA, T early 
alpha (TEA) region 


1.8 1 


3393018 


(AL03 1 1 74) hypothetical 
protein 


5e-08 


113 


M35498 


Woodchuck c-myc 
protein gene, exon 1. 


1.8 


3183405 


HYFO'IREIIUAL 11.3 KD 
PROTEIN C2C6.07 IN 
CHROMOSOME I 
>gi|2370504|gnl|PID|e339l94 
pombe] 

>gi|3451305|gnl|PED|el316730 
(AL031324) very hypothetical 
Drotein [Schizosaccharomyces 
pombe] 


8e-l0 


114 


M84166 


Hamster c-Ha-ras 
protein gene, 
complete cds. 




3386622 


'AC004665) unknown protein 
Arabidopsis thaliana] 


2e-10 


115 


U33135 


Mychodea carnosa 
18S ribosomal RNA 
gene, complete 
sequence 


1.8 


3334982 


(AC005306) R27216J [Homo 
sapiens] 


3e-22 J 


116 


U 84003 


Homo sapiens 
putative tumor 
suppressor (BIN1) 
gene, exons 7-12 


1.7 


<NONE> 


<NONE> 


* 

<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















117 


AE001121 


Borrelia burgdorferi 
(section 7 of 70) of 
the complete genome 


1.7 


<NONE> 


<NONE> 


<NONE> 


118 


AE001114 


Archaeoglobus 
fulgidus section 165 
of 172 of the 
complete genome 


1.7 


<NONE> 


<NONE> 


<NONE> 


119 


U82064 


Angiostrongylus 
cantonensis adult- 
specific muscle 
protein* 1 gene, partial 
cds 


1.7 


<NONE> 


<NONE> 


<NONE> 


120 


AF041836 


Buchnera aphidicola 
plasmid pLeu-Sg, 
complete plasmid 
sequence 


1.7 


<NONE> 


<NONE> 


<NONE> 


121 


M87479 


Lymnaea stagnalis 
FMRFamide sene. 
mature peptides. 


1.7 


<NONE> 


<NONE> 


<NONE> 


122 


M55163 


Xenopus laevis 
fibroblast growth 
factor receptor 
mRNA, complete cds. 


1.7 


<NONE> 


<NONE> 


<NONE> 




SS7«S 


i FT *S 

histamine H2- 
receptor [rats. 
Genomic. 1928 m] 


* 

1.7 


<NONE> 


<NONE> 


<NONE> 


124 


M27256 


Simian 

immunodeficiency 
virus (SIV) pol 
resion. 


1.7 


<NONE> 


<NONE> 


<NONE> 


125 


U31516 


Human chromosome 
8 anonymous clone 
pBS8-165 


1.7 


j 

<NONE> 


<NONE> 


<NONE> 


126 


X 12671 


Human gene tor 
heteroseneous 

w 

nuclear 

ribonucleoprotein 
(hnRNP) core protein 
Al 


1.7 


<NONE> 


<NONE> 


<NONE> 


127 


AF009054 


Paeonta suttruticosa 
ssp. spontanea 
alcohol 

dehydrogenase IB 
(AdhlB) gene, partial 
cds 


1.7 


<NONE> 


<NONE> 


<NONE> 
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ijT^J Nearest Neiehbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) | 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















128 


AF046917 


Mus muscuius 
transketolase gene, 
exon 6 and partial cds 


1.7 


| <NONE> 


<NONE> 


<NONE> 


129 


D89053 


Homo sapiens mRNA 
for Acyl-CoA 
synthetase 3, 
complete cds 


1.7 J <NONE> 


<NONE> 


<NONE> 


130 


U57968 


Staphylothermus 
mar in us surface layer- 
associated STABLE 
protease gene, 
complete cds. 


1.7 I <NONE> 


<NONE> 


<NONE> 


131 


L39072 


Bovine herpesvirus 1 
(clone p95) UL24 
homologue gene, 
complete cds. 


1.7 


<NONE> 


<NONE> 


<NONE> 


132 


X04980 


Drosophila simulans 
retrotransposon 297 
5'-LTR and flanks 
(pWK1020) 


1.7 


<NONE> 


<NONE> 


<NONE> 


133 J AE001U4 


Archaeoglobus 
fulgidus section 165 
of 172 of the 
complete genome 


1.7 


<NONE> 


<NONE> 


<NONE> 


134| X04434 


Human mRNA for 
insulin- like erowth 
factor I receptor 


1.7 


<NONE> 


<NONE> 


<NONE> 


135 


U07890 


Mus muscuius 
C57Biy6J epidermal 
surface antigen 
(mesa) mRNA, 
complete cds. 


1.7 


<NONE> 


<NONE> 


<NONE> 


136 


D26163 


Human tyrosinase 
gene, 5-flanking 
region cell-specific 
transcription) 


1.7 


<NONE> 


<NONE> 


<NONE> 


137 


* 

AF0938 1 8 


Panorpa nipponensis 
NADH 

dehydrogenase 
subunit 5 gene, 
mitochondrial gene 
encodins 
mitochondrial 
protein, partial cds 


1.7 


<NONE> 


<NONE> 


V 

<koNE> 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neiehb 


or (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















138 


D50560 


mRNA for 
cytochrome P-450, 
complete cds 


1.7 


<NONE> 


<NONE> 


<NONE> 


139 


AF083488 


Mus musculus 
phospholipase Dl 
(PLD1) gene, exons 
18 and 19, complete 
sequence 


1.7 


<NONE> 


<NONE> 


<NONE> 


140 




Mus musculus 
Pontin52 mRNA, 


] 7 


. <NONE> 






141 


M73749 


Streptococcus 
saJivarius 

thermophilus beta-D- 
galactose (IacZ) gene, 
complete cds. > :: 
gb|M63636|STRLAC 
ZZ Streptococcus 
thermophilus beta-D- 
galactosidase (IacZ) 
gene, complete cds. 


1.7 


<NONE> 


<NONE> 


<NONE> 


142 


AE001114 


Archaeoglobus 

fnioiHiic w*f*finn 1 fiS 

of 172 of the 
complete genome 


1.7 


2183023 


(U84971) unknown [Homo 
sapiens] 


9.2 








1.7 


* 

130504 


UtNUMfeFULVPkUlfchN 
rCONTAINS: N-TERMINAL 
PROTEIN (PI); HELPER 
COMPONENT PROTEINASE 
INCLUSION PROTEIN (CI); 6 
KD PROTEIN 2 (6K2V 
GENOME- LINKED PROTEIN 
(VPG); NUCLEAR ... virus 
(strain D) 


9.2 


143 


L01983 


Human type IV 
sodium channel alpha 
polypeptide 


144 


L19731 


Plecotus rafinesquii 
mitochondrial 
cytochrome b gene. 5' 
end. 


1.7 


3327096 


(ABO 14541) KIAA0641 protein 
[Homo sapiens] 


9.1 


145 


AE001114 


Archaeoglobus 
fulsidus section 165 
of 172 of the 
complete genome 


1.7 


2183023 


(U84971) unknown [Homo 
sapiens] 


8.8 
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. , r ^??; ; : | Nearest Nei shbor (B lastN vs. Gen bank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


seqI 

ID 1 ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 














146 


L272I8 


Bos taurus serum 
amine oxidase 
mRNA, complete cds. 
> oxidase=amiloride- 
binding protein 
homoJog [cattle, liver, 
mRNA, 2664 nt] 


1.7 


1174459 


SIGNAL TRANSDUCER AND 
ACTIVATOR OF 
TRANSCRIPTION 6 (IL-4 
STAT) >gi|559855 (U16031) IL 
4 Stat [Homo sapiens] 


7.1 


147 


Z49868 


Caenorhabditis 

elegans cosmid 

W07E 11, complete 

sequence 

f Caenorhabditis 

elegans] 


1.7 


4204263 


(AC005223) 40409 
f Arabidopsis thaliana] 


6.7 


148 


AL022271 


Caenorhabditis 
eleaans cosmid 
F32F2, complete 
sequence 
[Caenorhabditis 
elesans] 


1.7 


2497969 


PERIPLASMIC NITRATE 
REDUCTASE PRECURSOR 
>gi|1086107|pir||S50163 nitrate 
reductase large chain precursor, 
periplasmic - Thiosphaera 
pantotropha >gi|600093 
(Z36773) periplasmic nitrate 
reductase large subunit 
[Paracoccus denitrificans] 


6.7 


149 


U43844 


Mus rnusculus cyclin 
D3 eene, complete 
cds 


1.7 


3861490 


( AF062037) capsid protein 
precursor [Thosea asigna virus] 


5,1 


150 


Z25464 


S. cere vis iae UNFl. 
LTVi.MRP8.CYB3 
and TGLI genes, 
complete CDS's 


1.7 


1255404 


(U53151) weak similarity to 
cytochrome b [Caenorhabditis 
elesans] 


4.1 


151 


U77846 


Human elastin gene, 
martial cds and partial 
3'UTR 


1.7 I 


3355682 


(AL03 1 124) putative secreted 
Ivase 


4.0 


152 


X62880 


S.scrofa mRNA for 
calcium release 
channel (CRC) 


1.7 | 3327080 


(AB014533) KIAA0633 protein 
[Homo sapiens 1 


4.0 


153 


Y00067 


Human gene for 
neurofilament subunit 
M (NF-M) 


1.7 


. 479829 


heterogeneous ribonuclear 
parti eel protein homo log - 
Caenorhabditis elegans 
similarity to RNA recognition 
motifs [Caenorhabditis elegans] 


3.9 
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SEQ 
ID 



154 



155 



156 



157 



Nearest Neiehbor (BlastN vs. Genbank) 



ACCESSION 



159 



160 



X68393 



AB012284 



M96633 



U49055 



Y 1590 9- 



DESCRIPTION 



P VALUE 



Nearest Neighbor fBlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



D.melanogaster gene 
for Beta-rubulin, 
exons 1 and 2 



Shuttle vector 
pAUR123 gene for 
Aur.l-C, complete cds 



Rattus norvegicus 
mitochondrial 
intermediate 
peptidase (MIP) 
mRNA. complete cds 



Rattus norvegicus 
CTD-binding SR-like 
protein rA8 mRNA, 
complete cds 



Mus musculus mRNA 
for myc- in non- 
protein - 1 — 



U67600 



AF013759 



Methanococcus 
jannaschii section 142 
of 150 of the 
complete genome 



Homo sapiens 
calumein (Calu) 
mRNA. complete cds 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



P VALUE 



2342682 



417704 



2314209 



2497252 



917776 



2982355 



2982355 



(AC000106) Contains similarity 
to Rattus AMP-actjvated protein 
kinase (gb|X95577). 
[Arabidopsis thaliana] 



POL POLYPROTEIN 
(ORFIA/1B) [CONTAINS: 
RNA-DIRJECTED RNA 
POLYMERASE ; HELICASE; 
PROTEASE 1 



(AE000613) H, pylori predicted 
codine resion HP 1054 



1NSUL1NT-L1KJ: URUW1H 

FACTOR BINDING PROTEIN 
4 (1GFBP-4) (EBP-4) (IGF- 
BINDING PROTEIN 4) factor- 
binding protein-4 - sheep 
(fragment) factor-binding 
protein-4, IGFBP-4 [sheep, 
liver, Peptide, 237 aa] [Ovis 
aries] 



iduronate-2-suifatase, IDS {EC 
3.1.6.13} Peptide Mutant, 550 
aaj 



(AF052252) fork head domain 
protein FKD9 [Danio rerio] 



(AF052252) fork head domain 
protein FKD9 [Danio rerioj 



3.8 



3.8 



3.1 



3.0 



.0 



3.0 



2.9 
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: :a^:;,-J Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















161 


AF062915 


Arabidopsis thaliana 
putative transcription 
factor (MYB90) 
mRNA. complete cds 


1.7 


■ 

3878065 


Human uiRNA piuihiu 
K1AA0077 (TR:Q14997); 
cDNA EST yk243h8.5 comes 
from this gene; cDNA EST 
yk243h8.3 comes from this 
gene; cDNA EST yk359h4.5 
comes i rum mis gene 
[Caenorhabditis elegans] 
>gi|3880318|gnl|PED|eI349839 
(Zo 1 133) similarity to Human 
mRNA product KIAA0077 
(TR:Q 14997); cDNA EST 
yJc243h8.5 comes from this 
gene; cDNA EST yk243h8.3 
comes from this gene; cDNA 
EST yk359h4.5 comes from this 
gene 


2.3 


162 


X87526 


H.sapiens genomic 
DNA (chromosome 
3: clone NL3003R) 


1.7 


3638957 


(AC004877) sco-spondin-mucin- 
like; similar to P98167 uncertain 
[Homo sapiens] 


2.3 


163 


AC0O5573 


Homo sapiens 
chromosome 5, PAC 
clone 202el3 


1.7 


2465540 


(AF005632) phosphodiesterase 
I/nucleotide pyrophosphatase 
beta [Homo sapiens] 


1.8 


164 


D83402 


Homo sapiens gene 
for prostacyclin 
synthase, exon 10 and 
complete cds 


1.7 


627608 


steroid hormone receptor TK3 - 
human sapiens] 


1.7 




165 


AF053700 


Homo sapiens deltex 
(Dx) mRNA, 
complete cds 


1.7 


2662089 


(AB007864) KIAA0404 [Homo 
sapiens] 


1.7 


- 

166 


1 

AF043225 


Mus musculus 6- 
pyruvoyl- 
tetrahydropterin 
synthase (Pts) 
mRNA. complete cds 


1.7 


2352538 


(AF006564) alcohol 
dehydrogenase [Drosophila 
persimilis] persimilis] 


1.4 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















167 


U52917 


Inermus aquatic us 
ihermophilus NADH 
dehydrogenase I 
subunits NQ07 
NQ06, NQ05, 
NQ04, NQ02, 
NQOLNQ03. 
NQ08, NQ09, 
NQOIO, NQOIU 
NQOI2. NQOl3,and 
N0014. comDlete 
cds. 


• 

1.7 


2564334 


(AB006631)The human 
homoIo° of mouse Cux-2 
[Homo sapiens] 


1.0 


168 


X72222 


M.musculus gene for 
serotonin 2 receptor 


1.7 


3875796 


Similarity to Yeast 
hypothetical YIK9 protein 
(S W: YIK9_YE AST); cDNA 
EST EMBL:T01252 comes 
from this gene; cDNA EST 
EMBL:D33205 comes from this 
gene; cDNA EST 
EMBL.-D33955 comes from this 
gene; cDNA EST 
EMBL:D35484 co... 


1.0 


169 


U23186 


Cro talus scutulatus 

PLA2-like 

pseudogene 


1.7 • 


853971 


(X83413) DR5 [Human 
herpesvirus 6] >gi|853972 
(X83413) DR5 [Human 
herpesvirus 61 


0.99 
















170 


M83118 


Mus musculus factor 
Vlll-associated 
protein (f8a) mRNA, 
complete cds. 


1.7 


320 16 17 


(AC004669) hypothetical 
protein [Arabidopsis thaliana] 


0.80 


171 


M38347 


E.coli ATP- 
dependent proteinase 
(Ion) gene, complete 
cds. 


1.7 


4140322 


rAL03l2S2) dimkUJl (Cell 
Division Cycle 2-Like 2 
(PITSLRE, p58/GTA, 
Galactosyltransferase 
Associated Protein Kinase)) 
(isoform beta 2-2) [Homo 
sapiens] 


0.78 


172 


U2S838 


Human transcription 
factor TFIIIB 90 kDa 
subunit 


1.7 


2495730 


HYPOTHETICAL PROLINK- 

RICH PROTEIN KIAA0269 
>gi| 1 665 805 |gnl|PID|d 1014089 
(D87459) Similar to Volbox 
carteri extensin (S22697) 
[Homo sapiens] 


0.62 



(0 



WO 01/02568 PCT/US00/18374 



SEC 
ID 


cares 

— - 

ACCESSIOI 


t Neienbor (BlastN vs. 
V DESCRIPTION 


Genbank) 
P VALUI 


1 Nearest NeisI 

S I ACCESSION 


ibor (BlastX vs. Non-Redundant Proteins) 

DESCRIPTION P VAT TIP 


173 


J, U72487 


Rattus norvegicus 
calcium-independeni 
alpha- latrotoxin 
receptor mRNA, 
complete cds 


1.7 


1 544411 


GLYCOPROTEIN GP100 
PRECURSOR (P29F8) 
discoideum] 


ti 

0.35 


174 


AE000718 


Aquifex aeolicus 
section 50 of 109 of 
the complete senome 


- 

1.7 


J 2497569 


FIBROBLAST GROWTH 
FACTOR RECEPTOR 3 
PRECURSOR (FGFR-3) 
(HEPARIN-B INDING 
GROWTH FACTOR 
RECEPTOR) 
>gi|21I7851|pir||I55363 
fibroblast growth factor receptor 
3 - mouse >gi|399145 (M81342) 
fibroblast growth factor receptor 
3 [Mus muse ul us] 


H 

I 

0.34 J 


175 




Oryza sativa GDP 
dissociation inhibitor 
protein OsGDI2 
(OsGDI2) mRNA, 
complete cds 


1.7 


125362 


MACROPHAGE COLON? 

STIMULATING FACTOR I 
RECEPTOR PRECURSOR 
(CSF-l-R) (FMS PROTO- 
ONCOGENE) (C-FMS) factor 1 
receptor - cat >gi|163855 
(J03149) M-CSF receptor [Felis 
domesticus] 


0.34 


176 


U95102 i 


Xenopus laevis 
mitotic 

jfiuopiioproiein 
nRNA, complete cds 


1.7 


i 

85058 i 


muscarinic acetylcholine 
receptor - fruit fly acetylcholine 
receptor [Drosophila 
nelanosaster] 


0.20 


177 J 


C 
n 

AF077352 h 


^nlamydomonas 
einhardtii myosin 
eavv chain 


1.7 J 


\ 

i 

\ 

( 

P 
P 

728901 h 

(t 
[C 

746516 |>< 


■UJROSOMAL PkOlMN 

10 PRECURSOR SP-10 - 
vestern baboon 
►gi|298488|bbs|127113 
S56458) SP-lCHntraacrosomal 
rote in [Papio papio=baboons, 
eptide, 285 aa] [Papio 
amadryas] 

J23517) D1022.7 
laenorhabditis elegans] 
2iJ3258651 elesansl 


0.20 

\ I 
0.068 I 


178_j_ 


C 
e 
F 

Sf 

[C 

292788 [el 


aenorhabditis 
legans cosmid 
53B8, complete 
;quence 

raenorhabditis 
egans] 


1.7 1 



WO 01/02568 



PCT/US00/18374 



ACCESSION 



DESCRIPTION 



AF0022I7 



D30749 



P VALUE 



Ralstonia euuopha 
megaplasmid pHGl 
nitric oxide reductase 
(norB) gene, 
complete cds 



Rat mRNA for 
protein tyrosine 
phosphatase 



MI5202 



Rat fast skeletal TnT 
gene encoding 
troponin T isoforms, 
complete cds. 



L07592 



Human peroxisome 
proliferator activated 
receptor mRNA, 
[complete cds. 



1.7 



Dendrobium 
crumenarum ACC 
[synthase gene. 
U6403 1 [complete cds 
I Homo sapiens 
docking protein 
(DOK-2) mRNA, 
AF Q34 97Q [ complete cds - 



1.7 



ACCESSION 



1143538 



1228035 



731172 



4033414 



3122885 



DESCRIPTION 
(XiS7883) mitochondria] capsule 



P VALUE 



selenoprotein [Rattus 
norvegicusj >gi|l354I35 
(U48702) mitochondia 
associated cysteine-rich protein 
SMCP 

(D83776) The KIAA0191 gene 
s expressed ubiquitously.; The 
KIAA0191 protein retains the 
C2H2 zinc- finger at its N- 
erminal region. fHomo sapiensl 



0.039 



0:008 



SKIN SECRETORY PROTEIN 
XP2 PRECURSOR | 4e -04 



Z12839 



L.longinorum 
encoding calmodulin. 

> :: 

gb|L189I2|LILCALM 
ODU Lilium 
longiflorum 
calmodulin mRNA, 
complete cds. 



17?" 



2289U97 



25H747 



PUTATIVE IMPORTIN BETA- 
4 SUB UNIT 

ASPARTYL-TRNA 
SYNTHETASE synthetase 
Bacillus subtilisl 



(U78737) 

alpha( l,3)fucosyltransferase 



2e-06 



2e-ll 



[Cricetulus griseusl 



8e-l2 



(AF02327O) probable 
transcriptional regulator dre4 



4e-l2 



fc-7 



WO 01/02568 



PCT/USOO/18374 



Nearest Neighbor (BlasiN vs. Gen bank) 



SEQ 

_ CD I ACCESSION 




DESCRIPTION 



186 



X53459 



Equine arteritis virus 
(EAV) RNA genome 

> :: 

emb|A45589|A45589 
Sequence 1 from 
Patent W09519438> 

emb|A58849|A58849 
Sequence I from 
Patent WO9700963 > 



gb|AR013959|AR013 
959 Sequence 1 from 
atent US 5773235 



2. 



187 I K02668 



E. coli ddl gene 
encoding D-aIanine:D 
alanine ligase and 
ftsQ and ftsA genes, 
complete cds, and 
ftsZ gene, 5* end. 



188 1 AB008375 



Homo sapiens mRNA 
for osteoblast specific 
cysteine-rich protein. 
complete cds 



189 



L36603 



Pseudomonas cepacia 
(clonePsudom70-I) 
heat shock protein 70 
(hsp70) gene, 
complete cds 



P VALUE i 



Nearest Neighb or (BlastX vs. Non-Redundant Proteins) "1 



ACCESSION 



DESCRIPTION 



LI 



3979817 



3879121 



1.7 



2496945 



1.7 



2661842 



/St 



■ (270033) WioJi Miitm 



Human tyrosine- protein kinase 
CSK (SW:CSK JfUMAN); 
cDNA EST EMBL:C 10908 
comes from this gene; cDNA 
EST EMBL:C12822 comes 
from this gene; cDNA EST 
yk40Sc2.3 comes from this 
gene; cDNA ESTyk408c2.5 ... 
Human tyros ine-protein kinase 
CSK (SWrCSK.HUM.-VN); 
cDNA EST EMBL:C10908 
comes from this gene; cDNA 
EST EMBL.-C 12822 comes 
from this gene; cDNA EST 
yk408c2.3 comes from this 
gene; cDNA EST vk4Q8c2.5 ... 



(Z703 1 0) predicted using 
Genefinder; Similariry to Mouse 
ankyrin (PIR Acc. No. S37771); 
cDNA EST EMBL.TC1923 
comes From this gene; cDNA 
EST EMBL.D32335 comes 
from this gene; cDNA EST 
EMBL.-D32723 comes from this 
gene; cDNA ES... Genefinder; 
Similarity to Mouse ankyrin 
(PIR Acc. No- S37771); cDNA 
EST EM3L:T01923 omes 
rom this gene; cDNA EST 
EMBL:D32335 comes from this 
gene; cDNA EST 
EMBL:D32723 comes from this 
gene; cDNA ES 

HYPOTHETICAL 55.9 KD 
PROTEIN EEED8.6 IN 
CHROMOSOME II >gi|733603 
(U23484) No definition line 
found [Caenorhabditis elegans] 



(Yl 5732) DNA polymerase beta 
[Xenopus laevis] 



P VALUE 



le-14 



2e-19. 



le-19 



6e-20 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



190 I Z49760 



Pblakesleeanus 
mRNA GTP 

cvclohvdrolase I 



191 | U52428 



-luman fatty acid 
synthase gene, partial 
cds 



192 | U12767 



Human mitogen 
induced nuclear 
orphan receptor 



193 [ 263478 



194 I AF084375 



_ 195 | AE00U14 



_19fi I AFQ8fl335- 



H.sapiens CpG DNA 
clone 35a 12. forward 
read cpg85al2.ftla . 



Homo sapiens 
inversin protein 
exons 8 and 9 



Archaeoglobus 
fulgidus section 165 
of 1 72 of the 
complete genome 



-fomo sapiens 
nversin protein. 



197 I U24217 



198 1 AE000580 



Kluyveromyces lactis 
RNA polymerase II 
largest subunit gene, 
partial cds 



Helicobacter pylori 
26695 section 58 of 
134 of the complete 



genome 



P VALUE 



1.7 



1.6 



1.6 



1.6 



1.6 



1.6 



Nearest Neighbor (BlastX vs. Non.Redundant Protein^" 



ACCESSION 



1731181 



1.6 



■ 4226073 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



HitFUiHfcUl'AL ttoKU 

ltlNL14A4.3IN 

CHROMOSOME H 

>gi|3874230|gnI|PID|el35I6I8 
protein (Swiss Prot accession 
number P38376); cDNA EST 
yk220e!0.5 comes from this 
^ene [Caenorhabditis elegansl 



P VALUE 



KAF125443) contains similarity 
I to S. pombe phosphatidyl 
synthase (GB:Z28295) 
[Caenorhabditis elegansl 



<NONE> 



3e-21 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



6e-25 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



199 



X62083 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 
' ~ 1 — 



P VALUE ACCESSION 



H.sapiens mRNA for 
Drosophila female 
sterile homeotic 
(FSH) homologue > : 
gb|M80613|HUMFS 
HG Human homolog 
of Drosophila female 
sterile homeotic 
mRNA, complete cds. 



200 M28064 



201 



odium 
brasilianum DNA 
lomologous to the 
histidine-rich knob 
protein region of 
Plasmodium 
falciparum. 



DESCRIPTION 



202 



U03114 



IStreptomyces albus 
lipase precursor (lip) 
gene, complete cds, 
and unidentified 5' 
ORF and 3' ORF, 
| partial cds. 



Strix varia oocyte 
maturation factor 
Mos (c-mos) proto- 
U88422 oncogene, partial cds 



203 



204 



205 



IHuman pulmonary 

surfactant-associated 
protein SP-A 

(SFTPl)gene, 
M68519 complete cds. 



[Homo sapiens 
transcription factor 
AF044575 POU4F3 



206 



IHomo sapiens 
(subclone 3_eI0 from 
P1H21)DNA 
L4S476 sequence. 



M18630 



Rat CNS 2',3'-cyclic 
[nucleotide 3- 

ihodiesterase 



P VALUE 



<NONE> 



457495 



1.6 



3638957 



137618 



1.6 



3875423 



2133625 



3687297 



1.6 



<NONE> 



(M26647) ORF X 
[Saccharomyces cerevisiae] 



(AC004877) sco-spondin -mucin 
like; similar to P98167 uncertain 
'Homo sapiens 1 



<NONE> 



8.4 



VITAMIN D3 RECEPTOR 
(VDR) receptor [Rattus 



(Z38I12) E03A3.6 
[Caenorhabditis elegans] 

GAB A transport protein 
tobacco hornworm 



4.7 



(AJ005588) 5-epi-aristolochene 
svnthase 



3880315 



(28 1 1 33) Similarity to Human 
mRNA product KIAA0077 
(TR:Q 14997) [Caenorhabditis 
elegans] 



WO 01/02568 



PCT/US00/18374 



SEQ 
ED 



Nearest Neighbor (BlastN vs. Cenbani^ 



ACCESSION 



DESCRIPTION 



207 I AF027174 



208 | U53448 



P VALUE 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA, complete 

cds _ 

Babesia microti heat 
shock protein 70 
(hsp70) gene, 
complete cds 



Nearest Neighbor (BlastX vs. Non-ReH.m^nr ^ 



ACCESSION 



ins) 



DESCRIPTION 



P VALUE 



1.6 



267068 



1.6 



1255429 



209 AF084367 



210 | D55635 



Homo sapiens 
inversin protein 
mRNA, complete cds 



2U I AF035756 



212 I X73479 



Yeast disl+ gene tor 
p93disL complete 

cds 

Streptomyces sp. 2- 
dehydro-3- 
deoxyphosphohepton 
ate aldolase gene. 
partial cds 



O.cuniculus rPTPA 
mRNA 



1.6 



1.6 



1730076 



TUMOR-ASSOCIATED 
ANTIGEN L6 



(U53155) strong similarity to 
the carboxyl two-thirds of valyl- 
tRNA synthetases 
Caenorhabditis ekgansl 



PROBABLE 

S ERINE/THREONINE- 

PROTEIN KINASE CY49.28 

>gi|l 370255|gnl|PID|e247094 
Z73966) pknJ 



3128353 



Rsapiens mRNA for 
213 1 X9 8330 r yanod i ne recep toTT 



214 X64194 



P.anserina FMR I 
ene exons 1 and 2 



215 I Z92788 



aenorhabditis 
elegans cosmid 
F53B8, complete 
sequence 

[Caenorhabditis 
elegans] 



1.6 
1.6 



853971 
3413810 



(AF010496) maltose transport 
inner membrane protein 



(X83413) DR5 [Human 
herpesvirus 6) >gi|853972 
(X83413) DR5 [Human 

herpesvirus 61 

(Y 17034) Bassoon [Mus 
musculus] 



1.6 



2072986 



;U95142) putative G-protein- 
coupled receptor G-protein- 
coupled receptor [AmhiHnp<r|<; 



thaliana] 



1.6 



128014 



1.6 



216 



jMethanobacterium 
thermoautotrophicum 
from bases 109890S 
to 1112186 (section 
94 of 148) of the 
AEOQQ888 (complete genome 



746516 



NECDIN >gi|9i I29|pir||jN0148 
necdin, brain - mouse 
>gi(200020 (M80840) necdin 
[Mus musculus] 



(U23517)DI022.7 
Caenorhabditis elegans] 
>gi|3258651 elegans] 



1.6 



462415 



INTERFERON- ALPHA/BETA 
RECEPTOR ALPHA CHAIN 
PRECURSOR (IFN-ALPHA- 
REC) >gi|346520|pir||S27387 
interferon alpha receptor type 1 - 
bovine >gi|432 



3.6 



2.2 



L2 



1.2 



0.97 



0.94 



0.73 



0.42 



0.19 



0.001 



WO 01/02568 



PCT/US00/18374 



;■ Nearest Neighbor fBlastN vs. Genbank) 



218 



219 



220 



221 



222 



223 



224 



ACCESSION 



DESCRIPTION 



iHomo sapiens mRNA 
for osteoblast specific 
jcysteine-rich protein, 
217 | AB0O8375 complete cds 



Orang-utan involucrin 
M25312 gene, complete cds. 



Cyprinus carpio 
mRNA for MyoD, 
ABO 12882 complete cds 



Caenorhabditis 
Jelegans cosmid 
U29487 C09C7 



1.6 



1.6 



1.5 



1.5 



M.musculus mRNA 
X74760 for Notch 3 



U72396 



U42391 



M92296 



225 J X94144 



Lycopersicon 
esculentum class II 
small heat shock 
protein Le-HSPl7.6 
mRNA, complete cds 



1.5 



Human myosin- IXb 
mRNA, complete cds 



5 ongo pygmaeus 
jamma- 1 and gamma 
2 globin genes, 
complete cds. 



C.japonica mRNA foi 
QNR-7 1 protein 



13 



1.5 



13 



Nearest Neighbor fBlasiX vs. Non-Redundant Proteins) 



P VALUE I ACCESSION 



2496945 



3875I3I 



<NONE> 



<NONE> 



1364094 



121855 



3688428 



DESCRIPTION 



P VALUE 



HYPOTHETICAL 55.9 KD 

PROTEIN EEED8.6 IN 
CHROMOSOME II >gi|733603 
(U23484) No definition line 
found [Caenorhabditis elepan sl 
(270750) similar to vanadate 
resistance protein 
transmembranous domains 
[Caenorhabditis elegansl 



le-!8 



3e-26 



<NONE> 



<NONE> 



integral membrane protein - 
Streptomyces pristinaespiralis 
>gi|872306 (X84072) integral 
membrane protein 
[Streptomyces pristinaespiralis j 



186413 



2745737 



UANASE U 
PRECURSOR cellulose 1,4-beta 
cellobiosidase (EC 3.2.1.91) 0 
precursor - fungus (Trichoderma 
reesei) 1,4-beta-cellobiosidase 
(EC 3.2.1.91) II - fungus 
cellobiohydrolase II 
[Trichoderma reesei] 



<NONE> 



(AJ01 1534) sucrose synthase 



(M13I44) inhibin A [Homo 
sapiens] 



(AF029791) UDP- 
Gal:betaGlcNAc beta 1,3- 
galactosyltranferase-II [Mus 
musculus] 



4.2 



0.22 



3e-08 



WO 01/02568 



PCTYUS00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



226 1 AB014557 protein, partial cds 



Nearest Neighbor (BlastX vs. Non. Redundant Prntei~ 




Homo sapiens mRNA 
for KIAA0657 



227 I AF00Q948 



228 | AFQ57287 



229 | U38951 



230 | AF027148 



Borrelia burgdorferi 
oligopeptide 
permease homolog 
OppAIV (oppAIV) 
;ene, complete cds 
Mus musculus 
RAB/Rio protein 
mRNA, pan ial cds 
Drosophila 
melanogaster 
vacuolar ATPase 
subunit £ 



1.5 



1212992 



231 | AF079310 



Homo sapiens 
myogenic 

determining factor 3 
vlus musculus histone 
deacetylase 3 
(Hdac3) gene, exons 
4 through 15 and 
complete cds 



(X90568) Protein sequence and 
annotation available soon via 
Swiss- Prot; available at present 
via e-mail from 

LABEIT@EMBL- 
Heidelberg.DE fHomo sapiens! 



4e-13 



1.3 



1.3 



<NONE> 



2498005 



1.1 



1.1 



<NONE> 



3172134 



P.radiata lac gene for 



<NONE> 



MYC PROTO-ONCOGENE 
PROTEIN (C-MYC) proto- 
oncogene fSus scrofal 



<NONE> 

(U90209) RNA polymerase II 
argest subunit [Bonnemaisonia 
lamifera] 



<NONE> 



2.6 



1.0 



1657601 



-MS- 



99 6020 



<NONE> 



(U66220) unknown 
[Nannocystis exedensl 



(X91638) BRM protein [Gallus 



2.3 



0.25 



233 | D89016 



Human mRNA for 
Neuroblastoma, 
complete cds 



sallus] 



0.93 



234 I X76392 



C.r'amiliaris VIP36 
(vesicular integral - 
membrane protein of 
36 kDa) mRNA 



<NONE> 



0.93 



235 I AF100694 



Vlus musculus 
Pontin52 mRNA. 
complete cds 



4176446 



(AL022238)dJ1042K10.2,l 
(novel protein with probable 
rabGAP domains and Src 
lomology domain 3) 



<NONE> 



0.90 



<NONE> 



<NONE> 



7e-81 



<NONE> 



\ 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlasiN vs. Genbank) 



ACCESSION! DESCRIPTION I P VALUE ACCESSION 



237 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 1 



236 I AE 000991 



Z35922 



Archaeoglobus 
fulgidus section 1 16 
of 172 of the 
complete genome 



S.cerevisiae 
chromosome II 
reading frame ORF 
YBR053c 



238 U47331 



239 I X72810 



lattus norvegicus 
metabotropic 
iglutamate receptor 4b 
mRNA. complete cds. 



H.sapiens Ig germline| 
kappa-chain gene 
variable region (L3) 



240 Z11700 



241 



U71597 



Escherichia coli 
genes faeG, faeH, 
fael, faeJ and IS629- 
ike insertion 
sequence. > :: 

emb|Z11710|ECFAE 

HIJ E.coli faeH, fael 

and faeJ genes 

encoding FaeH, Fael 

and FaeJ proteins 
r'hrynosoma 

douglassii NADH 

dehydrogenase 

subunit 4 (ND4) 

gene, mitochondrial 

gene encoding 

mitochondrial 

rotein. partial cds 



DESCRIPTION 



0.90 



1176579 



<NONE> 



0.82 



1550703 



0.69 



3023063 



P VALUE 



(fcAKJLV til 1KANSUUH ' 2) 
>gi|1362345|pir||S55862 
probable membrane protein 
YNL327w - yeast 

(Saccharomyces cerevisiae) 
cerevisiae] 

>gi| l302445|gnI(PID|e239572 
(Z71603) ORF YNL327w 
Saccharomyces cerevisiae] 



<NONE> 



(AF052587) F14 [Xyiella 
fastidiosa] 



0.69 



0.65 



6.9 



<NONE> 



(Z80225) hypothetical protein 
Rv2662 i 4.1 





(AC002338) laccase isoiog 




2347188 


[Arabidopsis thalianal thalianal 


' 3.9 


<NONE> 


<NONE> 


<NONE> 



V 



WO 01/02568 



PCT/US00/18374 



SEQ 
ED 



Nearest Neighbor (BlasuN vs. Genbank) 



ACCESSION 



DESCRIPTION 



242 I Z77798 



Ammonia species 
LSU rRNA gene 
(partial; isolate Tr S 
5; clone 16) 



243 I D25542 



Human mRNA for 
golgi antigen gcp372, 
complete cds 



Cow dopamine 
Jtransponer mRNA, 
244 1 M8Q234 putative cds. 



245 



Homo sapiens mRNA] 
for KJAA0449 
AB007918 protein, partial cds 



Human U266 
(rearranged DNA for 
[lambda- 

lmmunoglobulin light 
246 I X31754 chain 



247 



Helicobacter pylori, 
strain J99 section 1 15 
of 132 of the 
AEQQ1554 (complete senome 



[Rsapiens CpG DNA, 
Jclone 96e7, reverse 
248 ' Z64067 readcpg96e7.nla. 



P VALUE 



Nearest Neighbor f BlastX vs. Non-Redundant P^i^T 



0.64 



Pinus sylvestris 
microsatellite DNA, 
249 | AJ223768 [clnno SPAC 1 1 .5 



0.64 



0.64 



0.64 



0.63 



0.62 



0.62 



0.62 



ACCESSION 



DESCRIPTION 



1174506 



1 1 1230 



3874972 



2833239 



207230 L 



<NONE> 



<NONE> 



<NONE> 



LrLU 1 AMVL- lt^NA 

^VNiRLlAiiKglutamate- 
tRNA ligase (EC 6.1.1.17)- 
Haemophilus influenzae (strain 
Rd KW20) >gi|1573240 
(U32713) glutamyl-tRNA 
synthetase (gltX) (Haemophilus 
influenzae Rd] 



P VALUE 



ultra-high-sulfur keratin i 
mouse 



(299709) similar to Elongation 
factor Tu family (contains 
ATP/GTP binding P-loop); 
cDNA EST EMBL:D76223 
comes from this gene; cDNA 
EST yk478c5.5 comes from this 
gene [Caenorh abditis eleeansl 
EPlbtkMAlCikOWTllI 



1.2 



Ie-05 



FACTOR RECEPTOR 
KINASE SUB STRIATE EPS8 
>gi|530823 (UI2535) epidermal 
growth factor receptor kinase 
substrate [Homo sapiens] 



(U95102) mitotic 

phosphoprotein 90 [XenopuT 
laevis] 



8e-06 



<NONE> 



<NONE> 



2e-14 



1.5 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



\ip5 



I 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



accessionI DESCRIPTION I P value! ACCESSION 



Nearest Neighbor (BlastX vs. Non-Redundant Protein^ 



250 1 AJO 11592 



252 



251 AF027151 



Bacteriophage PI ban 
g ene 
enopus laevis 

survival of motor 
neuron protein 
interacting protein 1 
(SIP 1) mRNA, . 
complete cds 



DESCRIPTION 



P VALUE 



0.62 



2493689 



IFHOTOSYSTEMII 16KB 

PHOSPHOPROTEIN deltoidesj 
>gi|2 143326|gnl|PID|e3 19090 
(Y13328) lOkDa 
phosphoprotein [Populus 
[deltoidesl 



Helobdella triserialis 
AJ000376 mRNA for actin 



0.62 



4007790 



(AL034463) putative single- 
strand polynucleotide binding 
[protein [Schizosaccharomyces 
pombe] 



0.62 



1117968 



253 



IRat thymosin beta 4 
M69231 gene (pTB4G).intron. 



254 AB021638 



Homo sapiens X11L2 
mRNAforXlI-like 
protein 2, complete 
cds 



255 



Bacteroides 
gingivalis DNA for 
arginyl 

endopeptidase, 
D26470 complete cds 



(U40763) CARS-Cyp [Homo 
sapiens] sapiens] 



4176370 



0.61 



<NONE> 



A.thaliana ATPase 
JQ4737 [gene, complete cds. 



257 



Bos taurus clone 
bml308 

Imicrosatellite and are 
UQ6756 lp repeat region. 



0.61 



(AC005058) similar to calcium- 
independent phospholipase A2; 
similar to AC0O4392 
(PID:g3367519) [Homo 
sapiens] 



0.90 



6e-5l 



<NONE> 



<NONE> 



<NONE> 



0.61 




258 S75756 



pl5=cyclin D- 
dependent kinases 4 
and 6-binding 
protein/pl5 product 
(exon/intron 1 } 
[human, brain tumors 
Genomic, 753 nt] 



0.61 



<NONE> 



1922280 



<NONE> 



<NONE> 



(Y09905) snail like protein 
Gall us gallus] 



<NONE> 



<NONE> 



0.51 



259 L39837 



Drosophila 
melanogaster tumor 
supressor (warts) 
mRNA exons 1-8, 
complete cds. 



0.61 



484938 



0.61 



3875131 



hypothetical protein 253 - 
Streptomyces griseus plasmid 
pSGl (fragment) 



(270750) similar to vanadate 
resistance protein 
transmembranous domains 
[Caenorhabditis eleeansj 



0.13 



le-09 




i 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Cenbankl 



ACCESSION 



DESCRIPTION 



260 I U52428 



261 



X 15292 



262 1 AB020663 



P VALUE 



Human fatty acid 
synthase gene, partial 
cds 

Plasmodium 
falciparum gene for 
heat-shock protein 
pPf203 

tfomo sapiens mRMA 
or KIAA0856 
)rotein. partial cds 



Nearest Neigh bor (BlastX vs. Non-Redundant FWinO 
ACCESSION 



DESCRIPTION 



0.61 



4226073 



0.60 



<NONE> 



0.60 



263 I U68723 



264 | M32687 



265 | AJ005331 



Human checkpoint 
suppressor I rnRNA, 
complete cds 



0.60 



S.griseus sporulation 
protein genes 1590 
and 1422. 
iomo sapiens 
NKCC2 gene, exon 4, 
soform B 



0.60 



266 I U14103 



267 | U95094 



268 I AE0QQ872 



complete cds. | 0.60 

Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA, 
complete cds | 0.59 

Methanobactenum 
theimoautotrophicum 
from bases 896604 to 
912784 (section 78 of| 
148) of the complete 
genome I o.59 



470341 



P VALUE 



to S. pombe phosphatidyl 
synthase (GB:Z28295) 
[Caenorhabditis eleeansl 



2e-26 



<NONE> 



<NONE> 



544375 



2582017 



U00043) No definition line 
f° "" d [Caenorhabditis elega n s I | 5.7 

PROTEIN REGULATOR 
glucose/galactose binding 
protein regulator - 
Agrobacterium tumefaciens 
>gi| 14222 8 (LI 0424) 
glucose/galactose binding 
protein regulator 



(AF0l287I}MergU' [Mus 
musculusj 



3128353 



4099845 



(AF010496) maltose transport 
inner membrane protein 



1.5 



3282851 



401553 



(AF047897) ankyrin-Iike protein 
HGE-ANK [Ehrlichia sp. BDS1 



5.5 



HYPOTHETICAL 24.5 KD 
PROTEIN IN NADB-SRMB 
INTERGENIC REGION 



4.3 



WO 01/02568 



PCT/US00/18374 



P VALUE I ACCESSION 




Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



DESCRIPTION 



269 LI 1871 



270 AF0I7I14 



Gallus gallus achaete- 
scute homologue 
(ASH) mRNA, 
complete cds. 



Oryctolagus 
cuniculus glycogen 
synthase mRNA, 
complete cds 



0.59 



628110 



271 AF027807 



272 U81787 



273 U76036 



Homo sapiens beta- 
casein (CSN2) gene, 
complete cds 



Human WntlOB 

mRNA, complete cds 
Apteryx austraiis 16S 
ribosoma! RNA gene, 
mitochondrial gene 
"or mitochondrial 
RNA, partial 
sequence 



0.59 



728856 



0.59 



3252932 



0.59 



274 AB014564 



275 I AF044171 



Homo sapiens mRNA 
for KIAA0664 

rotein. partial cds 
Homo sapiens cyclin 
dependent kinase 
inhibitor 2D 

(CDKN2D) gene, 
partial cds 



3875538 



0.59 



4193356 



0.59 



0.59 



1709851 



3925213 



piypuuieucii piuiem - uuunui 

let pes VlrUS 4 miiiUg IfUhie r 
I [Human herpesvirus 4] 2 
[Human herpesvirus 4] 
>gi|I334838|gnl|PID|e25079 4 
[Human herpesvirus 4] 
>gi|1334840|gnl|PID|e25081 6 
IfHuman herpesvirus 4] 
>gi|1334842|gnI(PE)|e25067 8 
[Human herpesvirus 4] 

>gi|1334844|gnl|PID|e25069 10 
J [Human, herpesvirus 4] 
>gi|1334846|gnl|PID|e2507I 12 
[Human herpesvirus 41 



NITROGENASE IRON-IRON 
PROTEIN ALPHA CHAIN 
(NITROGENASE 
COMPONENT I) 
(DINITROGENASE) capsulars 
>gi|3 12238 (X70033) 
alternative nitrosenase 



(AF067155) truncated rev 
protein [Human 
immunodeficiency virus type 1] 



(267990) similar to cuticle 
collagen 



(AF055088) ATP-binding 
cassette; PsaB [Streptococcus 
pneumoniae] 



FrB -ASSOCIATED SPLICING 
FACTOR (PSF) long form - 
human >gi|38458 (X70944) | 
PTB- associated splicing factor 
Homo sapiens! 



P VALUE 1 



— 



4.2 



2.4 



1.5 



1.4 



0.83 



AL032626) Y37D8A.I7 
Caenorhabditis eleeansl 



0.17 



3e-10 



WO 01/02568 



PCT/USOO/18374 




Nearest Neighbor (BlastN v$. Genbank) 



ACCESSION 



DESCRIPTION 



276 I L19640 



280999 
YI1108 



P VALUE 



Saccharomyces 
cerevisiae cdc2/cdc28 
related protein kinase 
ene. complete cds. 



Hi 



279 I U80001 



280 I 249637 



281 I X64467 



282 | X74506 



283 I U68040 



uman DNA 
sequence from 
cosmid E140G5 on 
chromosome 22, 
complete sequence 
Homo sapiens] 
H.sapiens WNT8B 
gene 



Nearest Neighbor (BlastX vs. Non.Redundant Pr^O 



ACCESSION 



DESCRIPTION 



0.59 



Sphyraena idiastes 
lactate dehydrogenase 

A 

S.cerevisiae 
chromosome X 
reading frame ORF 
YJRI37c 
H.sapiens ALAD 
gene for 
porphobilinogen 
synthase 



G.gaMus hox B3 
mRNA 



Cochliobolus 
leterostrophus 
polyketide synthase 



284 



lArabidopsis thaliana 
(putative auxin efflux 
carrier protein (PIN 1) 
AF089084 mRNA. complete c ds 
iRattus norvesicus 
ROK-alpha mRNA 
U3848 1 Icomplete cds 



286 1 AF017656 



287 I M96234 



Homo sapiens G 
protein beta 5 subunit 
[mRNA. complete c ds 
Human glutathione 
transferase class mu 
number 4 . 



0.58 
0.58 



0.58 



0.58 



0.58 



0.58 



0.58 



0.58 



0.58 



Human mRNA for 
KIAA0341 sene, 
288 1 AB002339 [partial cds 



3880115 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



<NONE> 



(Z8U30) T23G11.9 
[Caenorhabditis elegans 1 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 
<NONE> 

<NONE> 



<NONE> 



<NONE> 




(AC004684) hypothetical 
rotein f Arabidopsis thalian al 
(U55366) Similar to cuticle 
collagen [Caenorhabditis 
elegans] 



(U28741)F35D2.1 gene 
product [Caenorhabditis 
elegans] 



P VALUE 




le-21 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



<NONE> 



9.2 



7.1 



WO 01/02568 



PCT/US00/18374 




Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non- Redundant Pr 0 tei 



P VALUE 



289 



290 



UII295 



D8000I 



Neisseria 
gonorrhoeae 
carbamoyl phosphate 
synthetase 
(glutamine) small 
subunit (carA) and 
large subunit (carB) 
genes, complete cds. 



Human mRNA for 
KIAA0179 gene, 
partial cds 



291 



Z11700 



Escherichia coli 
genes faeG, faeH, 
fael, faeJ and IS 629- 
like insertion 
sequence. > :: 

emb|Z11710|ECFAE 
HIJ E.coli faeH, fael 
and faeJ genes 
encoding FaeH. Fael 
and FaeJ proteins 



292 1 M77350 



I Mouse hair keratin 
A I (MHKAl)gene, 
(complete cds. 



293 X63787 



294 



jT.thermophila gene 
for snRNA U3-2 



D63881 



Human mRNA for 
KIAA0I60 gene, 
artial cds 



E 



295 



U39378 



ymnocarena 
mexicana 16S 
ribosomal RNAgene 
mitochondrial gene 
encoding 

mitochondrial RNA, 
partial sequence 



296 



X87987 



P.pastoris PRC1 gene 
> 

dbj|E12103|E12103 
DNA encoding 
precursor of protease 
from Pichia pastoris 



0.58 



0.58 



0.58 



0.58 




0.58 



0.58 



2826900 



1934730 



0.58 



2I94I3I 



(AB 004461) DNA polymerase 
alpha catalytic subunit [Oryza 
sativa] 



(U95036) germin-like protein 
[Arabidopsis thalianal 



(AC002062) Similar to 
Synechocystis antiviral protein 



0.58 



3914197 



OCCLUDIN >gi| 1276983 
(U49221)occludin [Canis 
familiaris] 

>gi|1589181|prfl|2210347D 
occludin [Canis familiaris] 



3.1 



3.1 



3.1 



3.1 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



297 I X' ? 5782 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



A.thaiiana (L.Heynh.) 
chloroplast mRNA 
for recombinant APS 
kinase 



298 1 M64848 



Mouse platelet- 
derived growth factor 
B chain musculus 
slatelet-derived 
growth factor beta- 
chain (sis) gene, exon 
5. 



0.58 



299 



Helicobacter pylori, 
(strain J99 section 21 
of 132 of the 
AE001460 complete genome 



0.58 



0.58 



300 I X65720 



M.musculus gene for 
protein kinase C- 
gamma (exonl and 
exon 2) 



Arabidopsis thai i ana , 
301 I AF043130 lactate dehydroge nase! 

[Human genes for 



0.58 



0.58 



302 D28116 



303 



304 



305 



AE001075 



AF003948 



U 10692 



I collage n type IV 

alpha 5 and 6, exon 1 
and exon 1' | 0 58 

Arch aeoglo bus 
fulgidus section 32 of] 
172 of the complete 
genome I 0.58 



Rhodococcus opacus 
chloromuconate 
cycloisomerase 
transposase homoloa 
penes, comple te cds 



Human MAGE-7 
antigen (MAGE7) 
pseudogene, complete 
cds. 



0.58 



0.58 



DESCRIPTION 



1732444 



P VALUE 



(D38529) DRPLA protein 
[Homo sapiensl 



3025832 



(AF055985) pyrrol idone- rich 
antigen [Onchocerca volvulusl 



2827198 



418395 



(AF037454) ubiquitin protein 
[ligase [Mus m usculusl 
\LWl FRU'lhIN 

>gi|320737|pir||S30818 
hypothetical protein YER164w 
yeast (Saccharomyces 
cerevisiae) >gi|603404 
(U18917)Chdlp: transcriptional 
regulator [Saccharomyces 
[cerevisiae] 



3024637 



SEX-DETERMINING 
REGION Y PROTEIN 
determining protein [Mus 



1.1 



1.1 



0.62 



1458250 



2276333 



477072 



(U64835)T09D3.3 
[Caenorhabditis etegansl 



(Z97991) hypothetical protein 
Rv0336 



3287858 



mucin 7 precursor, salivary 
luman 



HOMEOBOX PROTEIN HOX- 
Cll 



0.36 



0.36 



0.28 



0.054 



Hi 



I 



WO 01/02568 PCT/US00/18374 



SEC 
ID 


^|| Nearest 


> 

ACCESSICK 


Neighbor (BlastN vs 1 
f DESCRIPTION 


Senbank) I Nearest Neiehfc 

- 

P VALUE | ACCESSION 


>or (BlastX vs. Non-Redundant F 
DESCRIPTION 


Voteins) 
P VALUE 


306 


I AF003948 


Rhodococcus opacus 
chloromuconate 
cycloisomerase 
iransposase homolog 
genes, complete cds 


0.58 


3551821 


(AF058803) mucin 4 [Homo 
sapiens] 


0.041 


307 


X99350 


H.sapiens HFH4 
gene, exon 1 and 
joined CDS 


0.58 


137483 


VAV PROTO-ONCOGENE 
>gi|55221 (X64361)proto- 
oncogene [Mus musculus] 


0.024 


308 


AJ234282 


Homo sapiens mRNA 
for Ig heavy chain 
variable region, clone 
C 


0.58 


t 

■ 3264846 


(AC003682) R27945 2 [Homo 
sapiens] 


t 

0.018 


309 


AF079310 


Mus musculus histone 
deacetylase 3 
(Hdac3) gene, exons 
4 through 15 and 
complete cds 


0.58 


1657601 


(U66220) unknown 
Nannocystis exedens] 


0.014 


310 


AF019367 


Human thiopurine 
methyl transferase 
(TPMT) gene, exons 
6 and 7 


0.58 


< 

3283352 |; 


[AF063020) lens epithelium- 
ierived growth factor [Homo 
;apiens] 


0.011 


311 


X65720 


M, musculus gene for 
protein kinase C- 
gamma (exon 1 and 
exon 2) 


0.58 


(U38291) microtubule- 
jassociated protein la [Homo 
1790878 sapiens] 


0.008 


312 


AB011155 


Homo sapiens mRNA 
for KIAA0583 
protein, partial cds 


0.58 


SYNAPSINS IA AND IB 
1351166 >2i|1637l3 


0.006 




1 


-Lsapiens mRNA for 
DNA 


0.58 


(D84307) phosphoethanolamine 
cytidylyl transferase [Homo 
1817548 sapiens] 


0.001 


314 


i 
i 
\ 
I 

U53746 g 

<■ 

K0O436 C 


-eline 

mmunodeficiency 
'irus isolate FIV- 
>co336-8 pol 
>olyprotein (pol) 

:ene. partial cds 

fcattus norvegicus 
clone rtl-I) pseudo- 
jlv-tRNA sene. 


0.58 
0.58 


(U93872) ORF 73, contains 
2246532 large complex repeat CR 73 

l(M64793) salivary proline-rich 
206712 1 prole in [Rattus norvegicus] 


2e-05 
le-05 


315 1 



( 7>~ 



i 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlasiN vs. Genbrmkl 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-RpHnn^, 



ACCESSION 



316 I S79632 



317 I D43964 



Hi>^i=heat shock 
factor 2 {alternatively! 
spliced, splice 
junction region) 
[mice. CBA/J, testis, 
Genomic, 120 nt. 
segment 2 of 31 



0.58 



Rat liver mRNA for 
Can-1, complete cds 



0.58 



318 I AB007918 



319 1 AB001466 



Homo sapiens mRNA, 
forKIAA0449 

rotein, partial cds I 0.58 
Homo sapiens mRNA| 
for EfsL complete 
cds I 0.58 



320 | ZH701 



Saccharomyces 
cerevisiae IRE1 o en e 
for putative protein 
kinase. 



S.cerevisiae 



_321 J Z49535 
322 | M62506 



323 I X05944 



chromosome X 
reading frame ORP 

lYJR035w 

S.cerevisiae DBF20 
ene, complete cds. 
Yeast PSS gene for 
Iphosphatidylserine 
synthetase 



324 I D38536 



325 I Z75004 



326 I L77034 



[Snail gene for ADP 
Iribosyl cyclase, 
[complete cd s 
(S.cerevisiae 

I chromosome XV 
[reading frame ORP 
YOR096w 



0.58 



0.57 



0.57 



eins) 



DESCRIPTION 



P VALUE 



4038594 



Homo sapiens 
(subclone 10__el0 
from PI H16) DNA 
sequence. 



0.57 



1280135 



2833239 



2943716 



3880115 



4106562 
<NONE> 



(AJ222798) tDETl protein 

[Lycopersicon esculentum] 
(lb:) J/6) coded tor by C. 

elegans cDNA cm21e6; coded 
for by C elegans cDNA 
cm01e2; similar to melibiose 
carrier protein 

(thiomethylgalactoside permease 

m 

( tWijtkMAL GkOWTH — 

FACTOR RECEPTOR 
IKINASE SUBSTRATE EPS 8 
>gi|530823 (U12535) epidermal 
growth factor receptor kinase 
(substrate fHomo sapiens! 

(D45027) 25 kDa trypsin 
inhibitor [Homo sapiens! 



3e-06 



le-08 



(Z81I30) T23G1I.9 
[Caenorhabditis elegans! 



(Z83 8 19)dJ140II21.2 (sim i lar 



3e-l 



2e-14 



9e-21 



to CYTOCHROME B-245 
HEAVY CHAIN) [Homo 
sapiens! 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



3e-33 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



0.57 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 

ACCESSION 



Nearest Neighbor (BlastN vs. Genbank) 



I 



DESCRIPTION 



327 



|Cyprinus carpio c 



myc gene forc-Myc, 
D37887 complete cds 



328 | AB014562 



te 

H 



329 



Homo sapiens mRNA 

for KJAA0662 

rotein, partial cds 
[uman DNA 

sequence from 
cosmid L75B9, 
[Huntington's Disease 
Region, chromosome 
Z69651 4pl6.3 



330 | D89285 



Mesocricetus auratus 
mRNA for inter-alpha 
trypsin inhibitor 
heavy chain 1, 
complete cds 



331 



332 



Z4895 1 



[S.cerevisiae 
chromosome XVI 
Icosmid 9723 



X95573 



A.thaliana mRNA for 
salt-tolerance zinc 
finger protein 



333 1 U95094 



Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA, 
complete cds 



334 



335 



AE00I116 



Z34291 



Borrelia burgdorferi 
(section 2 of 70) of 
the complete genome 



0.57 



0.57 



0.57 



0.57 



0.57 



0.57 



0.57 



R.norvegicus mRNA 
for putative chloride 
channel. 



0.57 



P VALUE I ACCESSION 



Nearest Neighbor (BlastX vs. Non-Redundant Protei ns ) 

DESCRIPTION 



197406 



1079280 



P VALUE 




(M57576) Ig kappa chain [Mus 
musculus] 



chaperonin containing TCP-i 
complex gamma chain - African 
clawed frog >gi|793886. 
(X84990) Cctg 



134132 



4210432 



1174828 



RYANODINE RECEPTOR, 
SKELETAL MUSCLE 



(AJ 130783) APC2 protein [Mus 
musculus] 



TYROSINE 
DECARBOXYLASE 2 
4. 1.1.25) -parsley >gij 169671 
(M96070) tyrosine 
decarboxyla se [Petroselinum 



465646 



2314735 



1 350832 



PkOBAlJL* ABC 
TRANSPORTER ATP- 
BINDING PROTEIN IN 
NTRA/RPON 5'REGION 
(ORF1) Azorhizobium 
caulinodans >gi|3 11388 
(X69959) 0RF1 



(AE000653) Na+/H+ anti porter 
(nhaA) [Helicobacter pylori 
26695] 



8.9 



bNA-UkfcClkl> RNA 

OLYMERASE I SECOND 
LARGEST SUB UNIT (RNA 

OLYMERASE I SUB UNIT 2) 
chain RPA2 - Euplotes 
octocarinatus (SGC9) 
>gi|578407 octocarinatus] 



8.9 



6.9 



5.3 



5.2 



4.0 



4.0 



3.0 



I7f 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



336 



D88255 



337 I AF037261 



Homo sapiens A30 
Vk germline gene, 
partial cds 



338 I U26595 



Homo sapiens SH3- 
containing adaptor 
molecule- 1 mRNA, 
complete cds 

Rattus norvegicus 
prostaglandin F2a 
receptor regulatory 
protein precursor, 
mRNA. complete cds 



339 X69903 



R.norvegicus mRNA 
for interleukin 4 
receptor 



340 I 274825 



iS.cerevisiae 
(chromosome XV 
reading fra me ORF 
IyOL083w 



341 | AJ131469 



Foot-and-mouth 
disease virus O vpl 
[gene, strain O/A/58 



342 



Mus musculus 
[regulator of G -protein 
signaling 7 (RGS7) 
AF01 1 360 J mRNA. complete cds 



343 



Mus musculus 
regulator of G-protein 
signaling 7 (RGS7) 
AF0U360 [mRNA. complete cds 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



0.57 



3875983 



0.57 



1397341 



0.57 



2773160 



0.57 



2649193 



0.57 



(Z3I063) similar to Actin in-tvoe 
jactin-binding domain containing 
proteins [Caenorhabdius 
elegans] 

protein; coded for by C. elegans 
IcDNA ykl84h5.3; coded for by 
C. elegans cDNA ykI84h5.5; 
[coded for by C. elegans cDNA 
ykl3d7.3; coded for by C. 
elegans cDNA ykl3d7.5; coded 
for by C. elegans cDNA 
yk31el,5; co... >gi|3493541 
(AF057567) kinesin-like protein 
ZEN-4a [Caenorhabditis 
[elegans] 



P VALUE 



(AF039656) neuronal tissue- 
enriched acidic protein [Homo 
sapiens] 



(AE001009) qui none-reactive 
[Ni/Fe-hydrogenase B-type 
cytochrome subunit (hydC) 
f Archaeoglobus fulgidusl 

(U64S46) F47D2.5 gene 



1458319 le legansl 




proline-rich protein - mouse 
(fragment) musculus] 



3.0 



gelsolin - American lobster 



gelsolin - American lobster 
>gi|452313 gelsolin [Homarus 
americanus] 



2.3 



2.3 



1.8 



1.4 



1.4 



0.S0 



0.80 



US 



WO 01/02568 



PCT/US00/18374 



_ *D | ACCESSION DESCRIPTION 



344 L392I0 



Homo sapiens inosine 
monophosphate 
dehydrogenase type II 
gene, complete cds 



Nearest Neighbor [BlastX vs. Non-Redundant Proteins) 



345 U81523 



346 U46561 



Human endometrial 
bleeding associated 
factor mRNA, 
complete cds 



Tetrahymena 
thermophila 
polyubiquitin (TTU3) 
gene, complete cds, 
and RNA polymerase 
II subunit 2 (RPB2) 
gene, panial cds 



0.57 



0.57 



347 X95543 



348 1 Y17282 



349 



350 



X0O716 



X53238 



C japonica mRNA fo 
Iegumin (clone 
CjLe 8 31) 



Homo sapiens mRNA 
for cytokeratin type II 



Frog mRNA fragment 
for aJpha-A2- 
crvstallin 



351 I X99012 



Klebsiella sp. 
bacteriophage KU 
gene 1 for RNA 
olymerase 

H.sapiens FUS gene, 
exon 12 



0.57 



0.57 



0.57 



352 I AJL0087U 



35: 



S74506 



Human DNA 
sequence from PAC 
390N22 on 
chromosome Xp22.2 



0.57 



SOX9 [human, fetal 
brain. Genomic, 1494 
nt, segment 3 of 31 



0.57 



DESCRIPTION 



559526 



P VALUE 



(X77466) 98.8kD polyprotein 
[Strawberry latent ringspot 
virus] 



0.79 



211499 



2506493 



(K01 702) HMW/LMW collagen 
subunit precursor [Gallus galltis] 0,79 



1709261 



3044086 



3406654 



1228093 



/> 243898 



HYPOTHETICAL 100.5 KD 
PROTEIN IN IAP-CYSH 
INTERGENIC REGION 
>gi|882654 (U29579) alternate 
gene name ygcB; ORF_f888 
.Escherichia col i] >gj|17891 19 I 0 60 
hbOKGHLAMtNi IklPLHT 
M PROTEIN (160 KD 
NEUROFILAMENT 
PROTEIN) (NF-M) 
>gi|I083164|pir||S55395 
neurofilament protein M - rabbit 
(fragment) >ei|854353 | 0.46 



(AF055904) unknown 
[Myxococcus xanthus] 



0.45 



(AF079369) transcriptional 
repressor TUP I [Dictyostelium 
discoideum] 



0.20 



Z46913) polyketide synthase 



1469545 



1326350 



S78897) GOR =antigenic 
epitope [chimpanzees. Peptide, 
427 aa] [Pan] 



(U535S5) fibronectin attachment 
rotein [Mycobacterium aviu m] 
(U5S748) similar to potential 
transmembrane domains in S 
cerevisiae nulcear division 
RFT1 protein (SP:P38206) 



0.16 



0.090 



0.053 



0.017 



)7<f> 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Gen bank) 



ACCESSION 



DESCRIPTION 



354 | D25542 



P VALUE 



Human mRNA for 
golgi antigen gcp372, 
complete cds 



0.57 



355 | AB015426 



356 1 X51394 



357 I AB007918 



358 I AB001466 



Mus m use u I us mRNAj 
for alphal,3- 
fucosyltransferase EX J 

complete cds | 0.57 

Xenopus mRNA for 
APEG protein, 
containing a highly 
repetitive amino acid 
sequence | 0.57 



359 | Y00760 



Homo sapiens raRNA| 
for KJAA0449 

rote in, partial cds 1 0.57 
Homo sapiens mRNA| 
for Efsl. complete 

£d$ | 0.57 

Rabbit mRNA for 

adult fast skeletal 

troponin-C | 0.57 



,360 1 X95153 



362 I U09251 



H. sapiens brca2 gene 
exon 3 > :: 

emb|A62778|A62778 



Patent WO9719I10 

IB. vulgaris mRNA for 

Ib etavulgin 
Mycoplasma 

Igenitalium DNA 

gyrase subunit B 

complete cds, DNA 

polymerase III beta 

subunit CdnaN) and 

seryl-tRNA 

synthetase (serS) 

genes, partial cds. 



0.57 
0.56 



0.56 



363 1 V00I58 



Chloroplast Euglena 
gracilis genes coding 
for transfer RN.As 
specific for threonine, 
glycine, methionine, 
[serine and glutamine. 0.56 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



4063399 



2661842 



1929056 



2833239 



2943716 



2576348 



3419847 
<NONE> 



<NONE> 



<NQNE> 



P VALUE 



(AF102575) cell surface protein 
[DTFA [Dictyostelium 
[discoideum } | 0 005 



(Y 15732) DNA polymerase beta 
IfXenopus laevis] | 7 c _j| 



(Y 1 2090) putative 3.4- 
dihydroxy-2-butanone kinase 
[[Lycopersicon esculentum] 

hWbhkM AL (JROWTH — 



9e-12 



FACTOR RECEPTOR 
KINASE SUBSTRATE EPS 8 
>gi|530823 (U12535) epidermal 
growth factor receptor kinase 
substrate [Homo sapiens] 



3e-I3 



[(D45027) 25 kDa trypsin 
inhibitor [Homo sapiens] 



2e-14 



(AC00240O) Glutamyl tRNA 
synthetase [Homo sapiens] 



2e-28 



(AC0O4982) similar to yeast 
hypothetical protein ybk4; 
similar to h'J! 
f PIP: g586461) f Homo sapiens] 

<NONE> 



2e-55 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



111 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) I Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



Clostridium 



P VALUE 



364 



D8815I 



365 U67478 



366 



367 



L23800 



AB01I129 



368 



369 



L77034 



Z47202 



perfringens DNA for 
D-alanine:D- alanine 
ligase, cortical 
fragment-lyric 
enz yme 



Methanococcus 
jannaschii section 20 
of 150 of the 
complete genome 



0.56 



Tachyglossus 
aculeatus beta-globin 
homolog (HBB) 
gene, complete cds 



Homo sapiens mRNA| 
for KIAA0557 
protein, panial cds 



0.56 



lomo sapiens 
^subclone 10_el0 
VomPl H16) DNA 
sequ ence. 



0.56 



370 



U53868 



371 



372 



AF041259 



L42636 



C.albicans gene for 

TFIIIB (BRF1) 

subunit. 

Clostridium 

cetobutylicum 

mannitol-specific 

phosphotransferase 

system (PTS) system, 

mtlA, rntlR. mtlF, and] 

mt|D genes, complete! 

cds 

Homo sapiens breast 
cancer putative 
transcription factor 
(ZABC1) mRNA, 
complete cds 



0.56 



0.56 



Plasmodium 
: alciparum variant- 
specific surface 
protein (var-7) 
mRNA. complete cds. 



0.56 



0.56 



0.56 



ACCESSION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



' <NONE> 



<NONE> 



2213557 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



\1% 



WO 01/02568 PCT/US00/1 8374 



ACCESSION! DESCRIPTION IpVAIlfP ACCESSION 



373 | U96180 



374 I L76259 



375JAF045946 



Human protein 
tyrosine phosphatase 
(TEPl)mRNA, 
complete cds 



376 1 X97986 



Homo sapiens PTS 
:ene, complete cds 
vtus musculus 

Dl6JhuI7 YAC 
98B3 acentric end, 
partial sequence 
M.musculus mRNA 
for desmocollin type 



0.56 



0.56 



I 



0.56 



0.56 



377 I X79437 



378 



379 



M27902 



M.musculus whey 
|acidic protein (WAP) 
tene, exon 1 | 055 

|Rat cardiac specific 
[sodium channel alpha- 
Isubunit rnRNA, 
[complete cds. | 0.56 



I Caenorhabditis 
lelegans cosmid 
AF036696 F15BI0 



0.56 



380 



381 



299102 



L27850 



Caenorhabditis 
elegans cosmid 
B0331, complete 
sequence 

[Caenorhabditis 
elegans] 



0.56 



Equus caballus (clone 
T131) T-cell receptor 
DNA. V-resion. 



0.56 



DESCRIPTION 



731016 
2369863 



P VALUE 



THIOREDOXIN REDUCTASE | 
thioredoxin reductase (NADPH) 
fCoxiella burnetii) 
(Y12225) Spi-I/PU.l 
transcription factor 



2130017 



403803 1 



549670 



585234 



546071 



603664 



hypothetical protein - common 
sunflower protein [Helianthus 
annuus] 

(AC005936) hypothetical 
'fotein [Arabidopsis thai i an a] 



COMPONENT SPC42 yeast 
(Saccharomyces cerevisiae) 
>gi|486054 (228042) ORF 
YKL042w [Saccharomyces 
cerevisiae] >gi|666098 
(X7I621) hypothetical 42.3 kD 
protein [Saccharomyces 
cerevisiae] 



5.1 



ENDOGLUCANASE G 
PRECURSOR 3.2.1.-) CelCCG 
precursor - Clostridium 
cellulolyticum cellulolyticum] 



gp70=envelope protpin 



(endogenous pro vims) host=cat 
lymphoid tissues. Peptide, 445 

aa] 



1079150 



l^TUl; putative reverse 
transcriptase; ORF2; encodes aa 
motifs conserved in reverse 
transcriptases; most closely 
related reverse transcriptases are 
those of non-LTR 
retrotransposons. The 3' 901 bp 
of this CDS are identical to the 
901 bp ... 



3.0 



transcription factor shn - fruit fly 



1.7 



in 



WO 01/02568 



PCT/US00/18374 



SEQ 

ED 



382 



— ■ 



383 



Nearest Neighbor (BlastN vs. Gcnbank) 



ACCESSION 



DESCRIPTION 



X97986 



P VALUE 



M.musculus mRNA 
for desmocollin type 



1 



384 



AF087455 



385 



386 



387 



388 



389 



391 



D80011 



AJ002272 



Didelphis virginiana 
G protein receptor 
kinase 2 mRNA, 
complete cds 



Human mRNA for 
KIAA0I89 gene, 
complete cds 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



0.56 



2497227 



0.56 



L39210 



X02770 



AF038575 



Mus musculus mRNA 
for HAP1-A protein, 
3' region 



Homo sapiens inosine 
monophosphate 
dehydrogenase type II 
gene, complete cds 



Mouse Thv-1.2 »ene 
5' untranslated region 
and exon 1 



0.56 



1213453 



0.56 



0.56 



0,56 



Schizosaccharomyces 
pom be Wiskott- 
Aldrich Syndrome 
protein homo log 
(wspl+) gene, 
complete cds t and 
BTF3/beta-NAC 
gene, partial sequence 



0.56 



X56747 



390 I Y 12072 



S75756 



Rat mRNA for fetal 
intestinal lactase- 
phlorizin hydrolase 
recursor, partial 
G.arboreum mRNA 
for fames vl 
pyrophosphate 
synthase 



0.56 



pl5=cyclin D- 
dependent kinases 4 
and 6-binding 
protein/pl5 product 
[exon/intron 1 1 
human, brain tumors. 
Genomic. 753 nt] 



0.56 



0.56 



226535 



P VALUE 



HYPOTHETICAL 113,1 KD 
PROTEIN IN PRE5-FET4 
INTERGENIC REGION 
>gi|1072409 ( Z54141) unknown 



(U 12964) contains ankyrin-likc 
repeats; similar to human 
desmoplakin repeat region 
[Caenorhabditis elegans] 



protease [Hepatitis B virus] 



1.7 



1.3 



3327158 



(AB014572) KIAA0672 protein 
[Homo sapiens] 



1.1 



628431 



3327046 



SS466 



coat protein - strawberry latent 
ringspot virus 



(AB014516) KIAA06 16 protein 
[Homo sapiens] 



2072742 



296670 



1082743 



salivary pro line-rich 
phosphoprotein precursor PRHl 
(allele P1F) - human >gi|190484 | 
(K03203) prepro salivary 
proline-rich protein [Homo 
sapiens] >gi[190512 



(Z48674) chitinase homologue 
fSesbania rostrata] 



(X07882) Po protein [Homo 
sapiens] 



protein kinase (EC 2.7.1.37) 
SPRK - human sapiens] 
>gi|l090771|prf||20l9437A 
jrotein Tvr kinase I 



1.0 



0.77 



0.59 



0.35 



0.23 



0,20 



0.15- 




* ■ 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN v S . GenbanJc) 



ACCESSION 



DESCRIPTION 



P VALUE 



392 U62528 



393 I X96877 



Nearest Neighbor (BlasiX vs. Non-Redundant Proteins) 



ACCESSION 



Equus cabal lus type 



DESCRIPTION 



II collagen mRNA, 

complete cds \__ 0.56 

Creinhardtii mRNA 
for unknown lumenal 
polypeptide | 0.56 



394 I S78788 



395 I AF006640 



cGATA-3 [chickens, 
liver, Genomic, 979 
nt. segment 4 of 4] 



Drosophila 
melanogaster Ste20- 
like protein kinase 
mRNA. complete cds 



Drosophila 
melanogaster Ste20- 
ike protein kinase 



397 J AE000716 



398 I Z36079 



399 



Y17267 



Aquifex aeolicus 
section 48 of 109 of 
the complete genome 
S.cerevisiae 
chromosome II 
reading frame ORF 

YBR210w _ 

Mus musculus mRNA 
for ubiquitin 
conjugating enzyme 



400 I AC00I461 



Homo sapiens 
(subclone 2_g5 from 
BAC HI 07) DNA 
sequence 



461671 



3341678 



0.56 



2661590 



0.56 



1 109830 



0.56 



3688350 



0.55 



0.55 



<NONE> 



<NONE> 



0.55 



<NONE> 



P VALUE 



[Segment 1 of 2] COLLAGEN 

ALPHA 1 ( I) CHAIN I 0.030 

(AC003672) putative zinc finger 

protein [Arabidopsis thaliana] | 5c-09 



(AL009196) l- 

evidence=predicted by content; 
l-method=genefinder;Q84; 1- 
method_score=59.41; 1- 
evidence_end; 2- 
evidence=predicted by match; 2-1 
match_accession=AA950019; 2- 

match_description=LD29959.5p[ 
rime LD Drosophila 
melanosas... 



(U4I534) coded for by C 
elegans cDNA CEESI42F; 
Similar to he I i cases of 
SNF2/RAD54 family. 
Caenorhabditis elegansl 



2e-U 



(U4I534) coded for by C. 
elegans cDNA CEESI42F; 
Similar to helicases of 
SNF2/RAD54 family. 



6e-12 



(aL()3U996j dJl 1S5B241 

(novel PUTATIVE protein 
similar to hypothetical proteins 

. pombe C22F3.I4C and C. 
elegans C 16A3.8) [Homo 
sapiens) 



3e-66 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



\ 



<NONE> 



I 



WO 01/02568 



PCT/US00/18374 



p££ Nearest Neiphbor (BlastN vs. Genbank) 



ACCESSION! DESCRIPTION I P VALUE I ACCESSION 



401 AFO 19079 



402 M90058 



Alouatia seniculus 



breast and ovarian 
susceptibility 
(BRCAl)gene, 
partial cds 



0.55 



Human serglycin 
gene, exons 1,2, and 
3. 



0.55 



Mus musculus CLM2 
gene for cytohesin 2, 
complete and partial 
cds, alternative 



403 ABO 13469 splicin 



0.55 



404 I AJ01I592 



405 



ZI5118 re 



406 Z4895 1 



Bacteriophage PI ban 



[ene 



ast 



.brucei kinetopl 
maxicircle variable 



DNA 



S.cerevisiae 
chromosome XVI 
cosmid 9723 



407 U78726 



408 IAG001389 



409 



M27640 



Homo sapiens mad 
protein homolog 
Smad2 gene, 
promoter, exon la 
and exon I b 



Homo sapiens 
genomic DNA, 21q 
region, clone: 
9H1 lBm42 



Plasmodium vivax 
major blood stage 
surface antigen gene, 
partial cds. 



0.55 



0.55 



055 



0.55 



0.55 



0.55 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



<NONE> 



<NONE> 



1729760 



2493689 



2970432 



4210432 



3319290 



DESCRIPTION 



549453 




P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(Z68152) chitinase [Gossypium 
hirsutum] 



PHOTOS YSTEM II 10 KD 
PHOSPHOPROTEIN deltoides] 
>gi|2 143326|gni|P£D|e3 1 9090 
(Y13328) lOkDa 
phosphoprotein [Populus 
deltoides] 



(AF049132) NADH 
dehydrogenase subunit 5 
[Florometra serratissima] 



(AJ 130783) APC2 protein [Mus 
musculusl 



( AF055994) thyroid hormone 
receptor-associated protein 
complex component TRAP220 
[Homo sapiens] 



KRUEPPEL PROTEIN 
gi|72899|pir[|TWFF Krueppel 
gap protein - fruit fly . 
(Drosophila sp.) melanogaster] 
>gi|224875|prf]| 1202348A 



125684 Krueppel sene 



S=LlNK£b PEST- 

CONTAINING 
TRANSPORTER transporter - 
human >gi|458255 (U05321) X- 
linked PEST-containing 
transporter [Homo sapiens] 



6.6 



6.5 



3.8 



3.8 



i 



WO 01/02568 



PCT/US00/18374 



t 



ACCESSION DESCRIPTION 



P VALUE 



410 



[Fugu rubripes mRN A 
[for sodium channel 
Jalpha subunit. panial 

D37977 cds 

Ostertagia ostenagi 
cathepsin B-like 
cysteine protease 
M88505 pene. panial cds. 

IXenopus laevis 
[mitotic 

phosphoprotein 44 
412 1 U95098, [mRNA. partial cds 



411 



413 I U89241 



414 | AF027151 



415 | AF006821 



_4I6 I Y12736 



IHuman mibp gene, 
>artial cds 
Lenopus Iaev7s 
[survival of motor 
neuron protein 
[interacting protein 1 
(SEP I) mRNA, 
complete cds 
Buto marinus 
natriuretic peptide 
[receptor C mRNA, 

)artial cds 

ILactococcus Iactis 
jcremoris plasmid 
pJW565 DNA. 
MabiiM, llabiiR genes 
and orfX 



417 



Imus musculus 
(collagen alpha- 1 type 
'i gene. 5" flanking 
(region, partial 

U38307 [sequence. 

[Mouse mRNA for 
418 J D 13473 Rad51 protein 

[Bungarus fascial us 
acetylcholinesterase 
gene, alternatively 
spliced products. 

artial cds 

ethanobactenum 
thermoautotroph ic um 
from bases I to 
I020S (section I of 

148) of the complete 
420 I AE000795 eenome 



419 I AF045238 



0.55 



0.55 



0.55 



0.55 



0.55 



0.55 



0.55 



0.55 



0.55 



0.55 



0.55 



ACCESSION 



DESCRIPTION 



P VALUE 



1435038 



3941277 



2570154 
4097465 



4007790 



2245075 



3386334 



1362802 
1374698 



3261734 



(D38024) ORF [Homo sapiens! 



(AF000900) p45 [Rattus 



(AB008376) 17-kDa PKC- 
potentiated inhibitory protein of 

,PP1 [Sus scrofal ' 

(U62253) 16kDa secretory 
Jrorein [Sus scrofal 



(AL034463) putative single- 
strand polynucleotide binding 
protein [Schizosaccharornyces 
ombe] 



(Z97343) GTP-binding RAB2A 
protein 



(AF035I20) type I procollagen 
pro-alpha 2 chain [Canis 
r amiliaris) 



1.7 



1.7 



gastric mucin - human 
(fragment) >gi|547517 
(DS3032) nuclear protein, 
NP220 [ Homo sapiens] 



(294752) hypothetical protein 
Rv 1004c 



1.3 



186396 



(M94I31) mucin [Homo 
sapiens] 



0.99 



0.97 




WO 01/02568 



PCT/US00/18374 



SEQ 

10 1 ACCESSION 



Nearest Neighbor (BlastN vs. Genbank) 



423 



424 



426 



427 



428 



DESCRIPTION 



421 1 X99537 



P VALUE 



Y.lipo]yticaSEC62 



gene 



422 U08147 



Z56586 



Aquilegia sp. 
phytochrome 
(PHYB/D) gene, 
panialcds. 



Nearest Ne.gnpor TBlastX vs. Non-Redundant Protein^ 



0.55 



H.sapiens CpG DNA, 
clone I2c8, reverse 
read cpg!2c8.rtld . 



Mus musculus 
gl u tarn ine : fruc tose -6 
phosphate 
amidotransferase 
(GFAT) gene, 5' 
U39442 region and partial cds 



0.55 



0.55 



IRat chymotrypsin B 
(chyB) gene, 
425 1 K02298 complete cds. 



|M. musculus clusterin 
X84792 laene 



-apra aegagrus 
(Saanen and Weisse 
Edel breeds DR beta 
[chain antigen binding 
domain, MHC class II 
U00185 DRB 



0.55 



0.55 



0.55 



H.sapiens CpG DNA, 
clone 178a 12, reverse 
254946 read cpgl78al2.nl a 



429 



430 



AF031650 



bryctolagus 
cuniculus anion 
exchanger 3 brain 
isoform (AE3) 
mRNA. complete cds 



0.55 



0.55 



M25579 



431 



(Bovine adenylyl 
cyclase Type I 
mRNA. complete cds. 



Z48796 



H.sapiens Ski-W 
mRNA for helicase 



0.55 



0.55 



ACCESSION 



DESCRIPTION 



3876397 



2338024 



(Z81068) F25H5.2 



[Caenorhabditis elegansl 



(AF005370) ribonucleotide- 
reductase, large subunit 



3320122 



(U46007) espin [Rattus 
norvegicus] 



282600 



3413810 



1652475 



hypothetical protein - 
Mycoplasma hvorhinis 



(Y17034) Bassoon [Mus 
musculus] 



2507136 



SUBTILIN BIOSYNTHESIS 
PROTEIN SPAB 



807646 



(M17294) unknown protein 
[Human herpesvirus 41 




1778210 



(U68412) fibrillar collagen 
f Arenicola marinal 



2649040 



0.55 



330452 



(AE000997) conserved 
hypothetical protein 
f Archaeoglobus fulgidusl 



(M 14708) DNA polymerase 
[Human cytomegalovirus] 



P VALUE 



0.58 



0.57 



0.44 



0.43 



0.33 



X>90905) hypothetical protein 0.25 



0.19 



0.044 



0.023 



0.023 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. GenbanH 



ACCESSION! DESCRIPTION p VALUE 



432 M80234 



Cow dopamine 
(transporter mRNA, 
mtati ve cds. 



Nearest Neighbor <BlastX vs. Non-R^nnH,,.. tw- irr) 



ACCESStON 



DESCRIPTION 



P VALUE 



0.55 



3874972 



similar to Elongation" 
factor Tu family (contains 
ATP/GTP binding P-loop); 
cDNA EST EMBL:D76223 
comes from this gene; cDNA 
EST yk478c5.5 comes from this 
gene [Caenorhabditis elegant 



4e-04 



(268314) similar to G-protein; 
cDNA EST EMBL.Cl 1959 
comes from this gene; cDNA 

EST EMBL:C10341 comes 
from this gene; cDNA EST 
yk494e4.3 comes from this 
gene; cDNA EST yk448a8.5 
comes from this gene comes 
from this gene; cDNA EST 
EiMBL:Cl0341 comes from this 
gene; cDNA EST yk494e4.3 
comes from this gene; cDNA 
EST yk44SaS.5 comes from this 
gene [Caenorhabditis elegans] 
>g i|3 SS0364|gnIfPID|e 1 349948 
(Z83016) similar to G-protein; 
cDNA EST EiMBL:C 1 1 959 
comes from this gene; cDNA 
EST EMBL:C10341 comes 



from this gene; cDNA EST 
yk494e4.3 comes from this 
gene; cDNA EST yk448a8.5 
Jcomes from this gene 
IfCaenorhabditis etegansl 



433 | U91616 



_434 1 D1Q91Q 



435 I L22013 



436 I Z92653 



Human I kappa B 
epsilon (IkBe) 
ImRNA, complete cds 
fArabidopsis thaliana 
|Atpk7 gene for 
serine/threonine 
(protein kinase, 
(complete cds 

►winepox virus 
(complete ORFS 
C20L-CIL > :: 
gb|I58297|I58297 
[Sequence 14 from 
)atent US 5651972 

Human 

immunodeficiency 
vims type 1 env gene 



0.55 



3875577 



0.55 



3876072 



(Z81505) Similarity to 
Metanococcus hypothetical 
protein 06S2 (TR:Q58095) 
IfCaenorhabditis elecansl 



7e-06 



0.54 



<NONE> 



<NONE> 



0.54 



<NONE> 



<NONE> 



4e-42 



<NONE> 



<NONE> 



WO 01/02568 



PCT/USOO/18374 



443 



445 



Nearest Neighbor fBlastN vs. Genbank) 



_IP ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Protein^ 



437 KOI 992 



AE00I415 



439 AF064030 



440 | X12591 



t.coii pnosphate- 

repressible 

periplasmic 

phosphate-binding 

protein (phoS), 

peripheral membrane 

proteins (pstC, pstB 

and phoU) and 

ntegral membrane 

protein (pstA) genes, 

complete cds. 
Plasmodium 

falciparum 

chromosome 2, 

section 52 of 73 of 

the complete 

sequence 



Helianthus ruberosus 
ectin 2 mRNA, 
complete cds 



E.coli plasmid DNA 
forcolicin E9 



441 U73679 



442 Z93990 



X85967 



444 [ U76524 



Caenorhabditis 
elegans YNKl-a 
mRNA, complete cds 



Unidentified 
bacterium DNA for 
16S ribosomal RNA 



B.vulgaris mRNA for 



setavulein 



Sambucus nigra 
ribosome inactivatins 
srotein precursor 
mRNA. complete cds 



X7IS00 



446 | U89241 



H.sapiens gene for 5S 
rRNA (640 bp) > 
emb|X7l80I|HS5SR6 
40B H.sapiens gene 
for 5S rRN'A f640 b p) 



Human mibp gene, 
partial cds 




WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



447 | L16013 



P VALUE 



Nearest Neigh bor TBIastX vs. Non-Redundant P rotein*) 
ACCESSION 



DESCRIPTION 



448 I U60275 



Ratrus norvegicus Q- 
like gene sequence | 0.54 



3087760 



Capra hircus skeletal 
muscle voltage-gated 
chloride channel 
gCJC-1 mRNA, 
partial cds 



0.54 



1781344 



449 I U36795 



450 I AF053091 



451 



V00602 



Myxococcus xanthus 
rfbABC O-antigen 
biosynthesis operon, 
rfbA, rfbB, and rfbC 
genes, complete cds. 



(AJ0O5583) p75 protein 
[Crypthecodinium cohniil 



(Y 10438) FK506 polyketide 
svnthase 



0.54 



Drosophila 
melanogaster eyelid 
(eld) mRNA, 
complete cds 



3877232 



(ZSI540) predicted using 
Cenefinder 



Genome of the 
bacteriophage fd 
'Inoviridae). 



0.54 



0.54 



2144110 



2661620 



452 



U60800 



453 I X85969 



Human semaphorin 
(CD 100) mRNA, 
complete cds 



0.54 



125682 



zinc finger protein RI2 - rat 
>ei|949996 

[(AL009I97) hypothetical 
>rotein 

vbKAHN.ULlKAHlGH- 
SLTLFUR MATRIX PROTEIN 
(UHS KERATIN) 

>gi| 109 116|pir||A36686 ultra- 
high-sulfur keratin - sheep 
>gi|1306 (X55294) ultra hiah - 
sulphur keratin protein [Ovis 
aries) 



S.coelicolor secD, 
secF & apt genes 



454 



H.sapiens mRNA for 
DAN26 protein, 
Y08265 Ipartial 



0.54 



3874972 



0.54 



3875131 



(Z99709) similar to Elongation 
factor Tu family (contains 
ATP/GTP binding P-Ioop); 
cDNA EST EMBL:D76223 
comes from this gene; cDNA 
EST yk478c5.5 comes from this 
gene fCaenorhabditis de^ansl 



(Z70750) similar to vanadate 
resistance protein 
ransmembranous domains 
Caenorhab ditis eleeansl 



P VALUE 



0.95 



0.95 



0.74 



0.14 



0.1 1 



0.003 



7e-06 



5e-12 



WO 01/02568 



PCT/US00/18374 




AF034597 



457 K02653 




X53416 




Habrobracon hebetor 
cytochrome oxidase 
II gene, partial cds; 
and tRNA-Asp, tRNA 
His, and tRNA-Lys 
genes, complete 
sequence, 

mitochondrial genes 
for mitochondrial 
products 



0.53 



<NONE> 



<NONE> 



<NONE> 



Yeast (S.cerevisiae) 
tau repetitive element 
and Cys-tRNA. | 0.53 
Human mRNA for 
actin-binding protein 
(filamin) | o.53 



M55545 



Drosophila 
subobscura alchohol 
dehydrogenase (Adh) 
gene, and alchohol 
dehydrogenase (Adh- 
dup) gene, complete 
cds's. 



<NONE> 



2134839 



<NONE> 



<NONE> 



[bullous pemphigoid antigen 2 - 
I human 



6.2 



0.53 



2136865 



hair keratin cysteine rich protein 
sheep ■ 



2.1 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



460 U 1 9362 



461 AB01I527 



462 U643I3 



463 I AF008590 



therrnoautotrophicum 
methylene- 

tetrahydromethanopte 
rin dehydrogenase 
(mtd), 

imidazoleglycerol- 
phosphate 

dehydrogenase 
(hisB), and putative 
ferredoxin (fdxA) 
genes, complete cds, 
orf9 gene, partial cds, 
orfs ... 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



0.53 



DESCRIPTION 



P VALUE 



731969 



HYPOTHETICAL 91.6 KD 
PROTEIN IN HXT8-CRT1 
INTERGENIC REGION 
>gi|l 07826 l|pir||S50773 
probable membrane protein 
YJL2 12c - yeast 
(Saccharomyces cerevisiae) 
>gi|496950 (Z34098) ORF 
[Saccharomyces cerevisiae] 
>gi| 1015596 (249487) ORF 
YJL212C 



0.54 



Rattus norvesicus 
mRNA for MEGF1. 
complete cds 



0.53 



417037 



Bacillus firmus MsyB 
gene, 5* upstream 
.region and partial cds 
Caenorhabditis 
elegans paraquat 
responsive protein 
(r e PqM133) mRNA, 



0.52 



<NONE> 



complete cds 



0.52 



<NONE> 



464 



L10245 



465 I AF027173 



VIus saxicola 
spermidine/spermine 
N 1 -acety itransferase 
(SSAT) gene, 
complete cds. 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 



0.52 



<NONE> 



0.52 



124263 



GERM CELL-LESS PROTEIN 
fruit fly (Drosophila 
melanogaster) >gi| 157490 
(M97933) germ cell-less protein 
[Drosophila melanogaster] 



3e-06 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 

FACTOR IB PRECURSOR 
(IGF-IB) (SOMATOMEDIN C) 
>gi|69361|pir||IGHUlB insulin- 
ike growth factor IB precursor - 
human prepropeptide [Homo 
sapiens] 



<NONE> 



7.7 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ED 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Caenorhabditis 










466 


AL021066 


elegans cosmid 
H31B20, complete 
sequence 
[Caenorhabditis 
elegans] 


0.52 


2589162 


(D88451) aldehyde oxidase [Zea 
mays] 


O.U 


467 


AF038588 


Porphyra linearis loo 
ribosomal RNA gene, 
3' partial sequence 


0.52 


1055055 


(UmsO) coded tor by C. 
elegans cDNA yk37gl.5; coded 
for by C. elegans cDNA 
yk5c9.5; coded for by C. 
elegans cjjin a ytciay.j, 
alternatively spliced form of 
F52C9.8b 


4.6 




AE001125 


Borrelia burgdorferi 
(section 1 1 of 70) of 
the complete genome 


0.52 


4115827 


(AB021287) polyprotem 
[Hepatitis G virus] 


2.0 


469 


AF006640 


Drosophila 
melanogaster SteZO- 
like protein kinase 
mRNA. complete cds 


0.52 


1109830 


(U41534) coded for by C 
elegans cDNA CEESI42F; 
Similar to helicases or 
SNF2/RAD54 family. 
[Caenorhabditis elegans] 


0.002 


4/U 


U90177 


Aplysia californica 
ubiquitin carboxyl- 
terminal hydrolase 
(Ap-uch) mRNA, 
complete cds 


0.51 


<NONE> 


<NONE> 


<NONE> 


A ^ 1 

4/1 


Z28304 


S.cerevisiae 
chromosome XI 
reading frame ORF 
YKR079c 


0.51 


<NONE> 


<NONE> 


<NONE> 


472 


292837 


Caenorhabditis 
elegans cosmia 
R03E1, complete 
sequence 
[Caenorhabditis 
elegans] 


0.51 


123506 


HYDROPHOBIC SEED 
PROTEIN (HPS) 


7.6 


473 


D 13803 


Mouse mRNA for 
RecA-like protein 
MmRad51, complete 
cds 


0.51 


3327228 


(ABO 14607) KIAA0707 protein 
[Homo sapiens] 


4.5 


474 


X07187 


Pea hsp21 mRNA 


0.51 


3328678 


(AE001299) hypothetical 
protein [Chlamydia trachomatis] 


4.4 



WO 01/02568 



PCT7US00/18374 



I 



480 



Nearest Neig hbor fBlasiN vs. Genbank) 



ACCESSION 



DESCRIPTION 



i2lJ_S63168 



LL Aa i yenftancer- 
inding protein 
delta=transcription 
factor CRP3 homolog 
[human, prostate 
carcinoma cell line 
LNCaP, Genomic, 
1594 nt] 



Xenopus laevis C2- 
JHC type zinc finger 
protein X-MyTl 
476 TJ67078 mRNA. comp ter. rHc 



Homo sapiens GT198 
mRNA, complete 
L38933 lORF 



Lycopersicon 
esculentum 
478 AFOOIQOQ polvgalacmmn W i 



js.cerevisiae 

chromosome XI 

reading frame ORF 
479 Z283Q4 Vgpino. 



Oncorhvnchus keta 
X97225 IGF-FTaPn. 



Homo Sapiens. RP58 
IcDNA for complete 



P VALUE 



0.51 



0.51 



0.51 



0.50 



0.50 



0.50 



Nearest Neighbor f BlastX vs. Non-RMundam ProteinO 



ACCESSION 



1653215 



3850320 



3219965 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



P VALUE 1 



(D9091l)apolipoproteinN- 
acyltransferase [Synechocysds 
sp 1 



(AF067520) PITSLRE protein 
kinase beta S V2 isoform [Homo 

sapiens! 

HYPOTHETICAL 100.6 KD 
TRP-ASP REPEATS 
CONTAINING PROTEIN 
C2C6.04C IN CHROMOSOME 



1.2 



0.17 



0.059 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/USOO/18374 



ACCESSION 



481 



482 



DESCRIPTION 



AJ00I388 



M86626 



483 



484 



U76523 



P VALUE I ACCESSION 



cDNA for complete 
rnRNA 



P.occuitum 23S 
ribosomal RNA, 
partial cds. 



DESCRIPTION 



P VALUE 



<NONE> 



0.50 



Sambucus nigra lectin 
precursor rnRNA, 
complete cds 



Mus musculus striatih 
AF031663 rnRNA, complete cds 



0.50 



485 I U32729 



486 



AF067198 



487 1 M23442 



Haemophilus 
influenzae Rd section 
44 of 163 of the 
complete genome 



Dictyostelium 
discoideum clone 
9.I0Tdd-3 and RED 
repetitive elements. 
partial sequence 
Human intcrleukin 4 
(IL-4) gene, complete 
cds 



0-50 



0.50 



U J 6367 



Caenorhabditis 
elegans POU 
homeobox protein 
CEH-18(ceh-18) 
rnRNA, complete cds 



AF001000 



218920 



i9lTD86983 



Lycopersicon 
esculentum 
polygalacturonase 1 



0.50 



0.49 



0.47 



492 



APQ64030 



Yersinia 
enterocolitica wbb 
gene cluster 
Human rnRNA for 
KIAA0230 gene, 
partial cds 



Helianthus tuberosus 
lectin 2 rnRNA, 
complete cds 



0.45 



0.41 



0.35 



0.33 



<NQNE> 



<NONE> 



1722856 



179521 



3875699 



2494740 



<NONE> 



3786409 



<NONE> 



<NONE> 



206712 



- 0NE> I <NONE> l<N ONF. 

CHROMOSOME ASSEMBLY 



PROTEIN XCAP-E African 
clawed frog >gi|563814 
(U13674) XCAP-E (Xenopus 
[laevis] 



<NONE> 



(M63730) BPAG2 [Homo 
[sapiens 1 



3.2 



(Z92829) F10A3. 15 

[Caenorhabditis elegansl I 0 65 

HYPOTHETICAL 28.3 KD" 
PROTEIN IN GBD 5"REGION 
(ORF4) >gi|2120954|pir|II39562 
ORF4 - Alcaligenes eutrophus 
> gi|695274 (L36817) ORF4 | 0.008 



<NONE> 



(AF098499) contains similarity 
to Saccharomyces cerev is iae 
MAPI protein (GB:UI9492) 
^Caenorhabditis eiegansl 



<NONE> 



8.9 



<NONE> 



<NONE> 



(M64793) salivary proline-rich 
protein [Rattus norvegicus] 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 




ACCESSION 



DESCRIPTION 



493 I AF067Q83 



Viireosciila sp. outer 
membrane protein 
homolog gene, 
complete cds; Trp 
repressor binding 
jprotein gene, partial 
cds; and unknown 
genes 



P VALUE 



494 | YI552Q 



Papio hamadryas 
anubis gene encoding 
fertilin alpha- II 



0.33 



495 1 U33475 



496 I D88356 



0.29 



Alestes sp. 
ependymin mRNA, 
partial cds 



497 I U67603 



Mouse DNA for 8- 
oxodGTPase, 
complete cds 



Methanococcus 
jannaschii section 145 
of 150 of the 
complete genome 



498 | U82386 



499 | 249625 



500 I U6433Q 



_501 I M24543 | 



Malurus cyaneus 
microsatellite McyU2 



0.28 



0.22 



0.22 



S.cerevisiae 
chromosome X 
reading frame ORF 
YJR 1 25c 



0.22 



Dictyostelium 
discoideum AX2 
protein tyrosine 

inase gene, complete 
cds. 



"luman prostate- 
specific antigen (PA) 
gene, complete cds. 



0.21 



Nearest Neighbor (BlastX vs. Non-Redundant 



ACCESSION 



0.21 



0.21 



DESCRIPTION 



401553 



2408049 



3913078 



<NONE> 



2209261 



99263 1 



<NONE> 



<NONE> 



2764859 



P value! 



HYPOTHETICAL 24.5 KD 
PROTEIN IN NADB-SRMB 
INTERGENTC REGION 



H V DKOCAKBOiq 

RECEPTOR NUCLEAR 
TRANSLOCATOR 
HOMOLOG (DARNT) 
(TANGO PROTEIN) 
transcription factor [Drosophila 
melanogasterl 



<NONE> 



(U51222) p40 [Streptomyces 
halstedii] 



8.3 



3.1 



1.4 



<NONE> 



(U29131) Mg-chelatase subunit 

fSvncchnrystis sp 1 



<NONE> 



8.3 



<NONE> 



<NONE> 



<NONE> 



(X97918) gene 12.1 
[Bacteriophage SPP11 



6.0 



\ 



WO 01/02568 



PCT/USOO/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION DESCRIPTION 



X876J8 



X71591 



X57808 



B.taurus mRNA for 
thrombospondin 
(partial) 216 2 bi 
B.taurus 

microsatellite 
sequence INRA048 



Human germline 
immunoglobulin 
lambda light chain 
ene 



U95098 



Xenopus laevis 
mitotic 

phosphoprotein 44 
mRNA. partial cds 



U84216 



U31463 



_ 508 \ X515Q8 



Mycobacterium 
fortuitum plasmid 
pJA238 replication 
protein Rep (rep) 
gene, complete cds 



R actus norvcgicus 
nonmuscle myosin 
heavy chain- A 
mRNA, complete cds. 



0.21 



0.21 



0.21 



0.21 



Rabbit mRNA for 

aminopeptidase N 
'partial) 



0.21 



509 j AF086476 



510 



AF077006 



_5U || X75480 



Homo sapiens full 
length insert cDNA 
clone ZD88F12 



0.20 



Helicobacter pylori 
plasmid pHPM186, 
complete sequence 



E.gunnii CAD gene. 



0.20 



0.20 



Nearest Neighbor (BlastX vs. Non-Redundant ProteTr^ 



P V.ALUE I ACCESSION 



DESCRIPTION 



ut)002b protein 



P VALUE 



2146000 



1354453 



2497139 



0-2 1 I 2499087 



3880111 



630864 



<NONE> 



<NONE> 



<NONE> 



1<V\ 



Mycobacterium tuberculosis 
tuberculosis] 

>gi|1694863|gnJ(PID|e283373 
(Z83018) hypothetical protein 
Rv2968c [Mycobacteri urn 
tuberculosis] 



U52830) orf [Homo sapiens! 



procollagen type V alpha 2 
2119158 Imouse >gi|309181 



PROTEIN IN ABF2-CHL12 
[INTERGENIC REGION 
gi|1078003|pir||S52835 
Jhypothetical protein YMR075w 
[yeast (Saccharomyces 
cerevisiae) >gi|763022 
(24S952) unknown 
[(Sacchar omyces cerevisiae] 



3.5 



GLUCOSE.GLYCOPROTEIN 

GLUCOSYL TRANSFERASE 
PRECURSOR (DUGT) 

giucosyltransferase - fruit fly 
(Drosophila sp.) 

giucosyltransferase precursor 
[Drosophila melano^aster] 



2,7 



2.0 



(Z8J 130) predicted using 
Genefinder 



LRR47 protein - fruit fly 
(Drosophila mclanogaster) 
>gi|4 15947 (X75760) LRR47 
fDrosophila melanoeaster] 



0.003 



0.002 



<NONE> 



le-06 



<NONE> 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



ACCESSION DESCRIPTION P VALUE 



512 | X75036 



513 I D9087S 



T.aestivum 



mitochondrial nad7 
gene for NADH 
dehydrogenase 

subunit 7 

E.coli genomic DNA, 
Kohara clone 

#422(5 5.5-55.8 min.) 

aenorhabdkis 
elegans cosmid 
F59B8, complete 
sequence 

[Caenorhabditis 
lelegans] 



iM.muscuIus V alpha 
515 I X624g 6 11.1 gene 5'-rP r inn 




U07357 



c-scr=pp60c-src, 
sdr=src downstream 



Mus musculus Balb/c 
brain-specific kinase 
(Bsk) mRNA. 
complete cds. 



0.20 



0.20 



0.20 



0.20 



0.20 



0.20 



0.20 



Rarest Neiehbor (BlastX vs. Non-Redundant p~.I 



ACCESSION 



0.20 



0.20 



ins) 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 
<NONE> 



1170683 



372 1 862 



126363 



4159887 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



KINASE ALPHA 

REGULATORY CHAIN, 
SKELETAL MUSCLE 
SOFORM 

(PHOSPHOR YLASE KINASE 
ALPHA M SUBUNIT) 
>gipI35923|pir||I38!ll 
phosphorylase kinase (EC 
2.7. J. 38) - human >ei|791043 



(ABO 16024) Pfj2 [Plasmodium 
falciparum] 



LAMININ ALPHA- 1 CHAIN 
PRECURSOR precursor - 
human 



(AC004908) similar to 

ribosomal protein L23a: similar 

oP293l6(PID:gl32848) 
Homo sapiens} 



206712 



(iM64793) salivary proline-rich 
protein fRattus norvepicusl 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



7.4 



1.9 



0.65 



0.52 



0.51 



*1s 



WO 01/02568 



PCT/LJS00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



521 



AF034460 



emcuijum tnomn 
internal transcribed 
spacer 1, 5.8S 
ribosomal RNA gene 
and internal 
transcribed spacer 2. 
complete sequence; 
and 28S ribosomal 
RNA gene, partial 
sequence 




522 U95098 



Xenopus laevis 
mitotic 

phosphoprotein 44 
mRNA, partial cds 



523 



524 



X95971 



L41502 



525 



0.20 



114] 36 



0-20 



2842674 



S.lividans groEL2 
gene 



Ovis aries 
vasopressin VI 
receptor (V1R) gene, 
complete cds 



J03885 



526 AE00145I 



K.pneumoniae 
oxalacetate 
decarboxylase alpha 
subunit gene, 
complete cds. 



Helicobacter pylori, 
strain J99 section 12 
of 132 of the 
complete genome 



0.20 



3925277 



0.19 



<NONE> 



0.19 



0.19 



<NONE> 



<NONE> 



AMINO-ACID 

ACETYLTRANSFERASE 
Pseudomonas aeruginosa 
>gi|151036 (M38358) N- 
acetylgiutamate synthase 
[Pseudomonas aeruginosa] 



UUMA1W ULAii t, 
ASSOCIATING FACTOR 1 (B 
CELL-SPECIFIC 
COACTIVATOR OBF-1) (OCT 
BINDING FACTOR 1) (BOB- 
l)(OCA-B)Bobl,B-ce!I- 
specific - mouse 
>gi| 188 IS 18|bbs|l 79852 
mBobI=B-celI specific 
ranscriptional coactivator line 
J558L, Peptide, 256 aa] 
>gi|I353792 (U43788) Oct 
binding factor 1 [Mus musculus] 



( ALU j 2 64.?) similar to 
Uncharacterized protein family 
UPF0034, Double-stranded 
RNA binding motif; cDNA EST 
vk4S9b3.5 comes from this 
gene; cDNA EST yk439g7.5 
comes from this gene 
[Caenorhabditis elegans] 



0.39 



0.073 



<NONE> 



<NONE> 



<NONE> 



4e-19 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/USOO/18374 




SEQ 
ID 



ACCESSION 



527 



DESCRIPTION 



P VALUE 



D88084 



IPedicularis 
verticil lata 
chloroplast DNA, 
i merge nic region 
between tmT(UGU) 

[and trnLfUAA)5'exon 



528 I U67599 



|Methanococcus 
jannaschii section 141 
jof 150 of the 
[complete genome 



529 I J05500 



Human beta-spectrin 
(SPTB) mRNA, 
(complete cds. 



530 | Y10137 



M.mycoides ftsY 
gene homologue and 
gene encoding 
[hypothetical protein 



531 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA, complete 
AF027174 cds 



532 



(Mouse thvmic 
stromal cell mRNA 
forTLSF-beta, 
D43805 complete cds 



533 I AJ012585 



Tetrahymena 
thermophila 
macronuclear gene 
encoding ribosomal 
protein L3, exons 1-2 



, 534 | X51475 



[Brassica napus 5- 
enolpyruvylshikimate 
3-phosphate synthase 
gene 



Sambucus nigra 
hevein-Iike protein 
535 j AF074386 mRNA. complete cds 



536 



S.cerevisiae 
chromosome X 
reading frame ORF 
Z49625 YJRI25c 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



0.19 



0.19 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 PCT/US00/18374 




,543 249835 



544 



H.sapiens mRNA for 
protein disulfide 
isomerase 



0.19 



Spinaciaoleracea 
heat shock 70 protein 
AF035458 protein, complete cds 



545 



546 



Tetrahymena 
[thermophila B 
(internal deletion 

U23441 sequence. 

[Pneumocystis carinii 
Imajor surface 
U5392 1 [glycoprotein 



0.19 



0.19 



0,19 



2113940 



267293 



(Z95556) hypothetical protein 

Rv2507 

PROBABLE E4 PROTEIN 
Ipapillomavirus (type 1) 
>gi|610l5 (X62844) E4 gene 
product [Pygmy chimpanzee 
papillomavirus type 1] 



3877185 



3548901 



547 



Rat ankyrin binding 
[glycoprotein- 1 related 
LU002 mRNA sequence. 



548 



|Methanococcus 
jannaschii section 102 
lof 150 of the 
JJ67560 complete genome 



0.19 



3337352 



(Z66563) F46C3.2 
[Caenorhabditis elegans] 
(AF052502) DA26 homolog 
[Epiphyas postvittana 
[nucleopolyhedro virus] 



(AC0044S1) putative chromatin 
structural protein Supt5hp 



549 



U 18424 



Mus musculus 
bacteria binding 
macrophage receptor 
MARCO mRNA. 
Icomplete cds. 



0.19 



3183689 



(Y135S5) serotonin receptor 4 
[Cavia porcellusl 



0.19 



3659853 



(AF0S90S3) complement 
(component ClqB like protein 



9.3 



9.3 



9.1 



8.7 



7,1 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. GgnhnnH 



ACCESSION 



DESCRIPTION 



550 I X66467 



P VALUE 



I 



552 | J05087 



553 I AF08Q464 



j54 | U78876 



C. albicans sec 18 g ene 

Syngaster lepidus I6S 
ribosomal RNA gene, 
anial sequence 



Rat calmodulin- 
sensitive plasma 
membrane Ca2+- 
transponing ATPase 
(PMCA3) mRNA. 
complete cds. 



0.19 



0.19 



Homo sapiens 
glutamate 
oxaloacetate 
transam inase 



0.19 



_555 1 AB 009077 



Human MEK kinase 
3 mRNA, complete 

cds 

Vigna radiata mRNA 
for proton 

pyrophosphatase. 
complete cds 



0.19 



556 1 U95098 



Xenopus laevis 
mitotic 

mRNA. partial cds 



0.19 



0.19 



0.19 



Nearest Neighbor f bhstX vs. Non.R,H , Jnd ant Proteins) 



557 I AE000392 



Escherichia coli K-12 
MG1655 section 282 
of 400 of the 
complete genome 



0.19 



ACCESSION 



1326385 



3122039 



422462 



3024834 



1710445 



3256922 



4226159 



DESCRIPTION 



(U3S75I)C07G1.7 



P VALUE 



3645960 



(product [Caenorhabditis 
elegansl 

U1HYDROPYR1MIDINASE 

(DHPASE) dihydropyrimidinase 
rat 

|>gi|I378019|pnl(PID|dl0i0479 



6.9 



hypothetical protein - fruit fly 
(Drosophila melanogaster) 
>gi|296434 (X68408) ORF 
(Drosophila melanogasterl 

PROBABLE E4 PROTEIN 
>gi|790898 position 3286.J288 
is first start codon; putative 



(U7S083) unknown [Emericella 



nidulans) 



(AP000002) 256aa long 
hypothetical protein 
fPyrococcus horikoshiil 



(AF125463) contains similarity 
to BTB (also known as BR- 
C/Ttk> domains (Pfam:PF0065 1 

core=62.8, E=7.6e-15, N=l) 
[Caenorhabditis elegansl 

(AL031583) 1- 

evidence=predicted by content; 
l-method=genefmder;084; 1- 
method_score=47,46; 1- 
evidence_end; 2- 

evidencc=predicted by match; 2 

match_:iccession=SVVISS- 
PROT.P23792; 2- 

match_description=DISCONNE 
[CTED PROTEIN.: 2-matc... 



6.9 



5.3 



5.3 



5.3 



5.1 



4.1 



4.0 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

jACCESSrON 



DESCRIPTION I P value ACCESSION 



Nearest Neighbor (BlastX vs. Non-Redundant Protein^ 



558 J AE000392 



Escherichia coli K-12 
MGI655 section 282 
of 400 of the 
complete genome 



559 



560 



L8I774 



AL021108 



561 



Homo sapiens 
(subclone 3_dl from 
PIH25) DNA 
sequence 



Drosophila 
melanogaster cosmid 
clone 137E7 



AB001510 



Carabus 
leptoplesioides 
mitochondrial DNA 
for NADH 
dehydrogenase 
subunit 5, partial cds 



Egernia stokes ii clone 



AF069696 EST1 micmsarellirp 



X64144 



U56897 



[F.pringlei ppcAl 
[gene for 

phosphoenol pyruvate 

carboxy lase 
(Human 

immunodeficiency 
virus type 1 gag 
polyprotein (gag) 
gene, partial cds 



DESCRIPTION 



565 U57975 



Danio rerio Notch 
homologue 3 mRNA, 
[complete cds 



0.19 



P VALUE 



3645960 



0.19 



4001725 



0.19 



4001688 



lALUj|;>8i) 1- 

evidence=predicted by content; 
l-method=genefinder;084; I- 
method_score=47.46; I- 
evidence_end; 2- 
evidence=predicted by match; 2 
match_accession=SWISS- 
PROT:P23792; 2- 

match_description=DISCONNE 
CTED PROTEIN.: 2-matc... 



(AB01598I) MnhA 
[Staphylococcus aureus] 



(AB0I57I8) protein kinase 
Homo sapiens! 



0.19 



0.19 



315SSS5 



3328994 



(298551.) MAL3P6.11 
[[Plasmodium falciparum! 
(AE001326) Amino Acid 
(Branched) Transport 
[[Chlamydia trachomatis! 



0.19 



3242974 



(AF069555) G protein-coupled 
receptor p2y3 [Meleagris 
[sallopavo] 



0.19 



2257710 



(U73041) resolvase-Iike protein 
[Thiobacillus ferrooxidansl 



0.19 



3874971 



K/.yy IW) similar to NAU 
[dependant 

epimerase/dehydratase family; 
cDNA ESTEMBL:C10103 
comes from this gene; cDNA 
EST EMBL:D66400 comes 
from this gene; cDNA EST 
EMBL:D70143 comes from this 
gene; cDN A EST yk493h 1 1 .3 
comes from ... 



3.0 



3.0 



2.4 



2.4 



2.3 



1.3 



WO 01/02568 



PCT/US00/18374 




Nearest Neighbor iBlastX vs. Non-Redundant PrST 



DESCRIPTION 



R.norvegicus mRNA 
566 | Y125Q2 for factor Xllla 



2133693 



567 1 S82470 



BBI=malignant cell 

expression-enhanced 
gene/tumor 

progress ion -en ha need 
[gene [human, UM- 
IUC-9 bladder 
carcinoma cell line, 
[mRNA, 1897 ntl 



568 | U97408 



Caenorhabditis 
elegans cosmid 
F48A9 



569 I U 1Q470 



Pseudomonas 
fluorescens PHA 
depolymerase (pha2) 
[gene, complete cds. 



Ovis aries MAF2I4 
Jlocus polymorphic 
570 | M88160 jdinucleotide repea t . 

fl lUltlii UliJLUi 

for pollen 
[allergen (Hoi I 2. 
group II) > :: 

|emb|AJ131339|LIT13 
1339 Lolium italicum 
| mRNA for pollen 
lallergen (Lol i 2, 
group II) > allergen 
(Poa p 2. group II) > 



571IAJ13I336 



emb|AJ13!33S|TAEl 
31338 Triticum 
laestivum mRNA for 
pollen allergen (Tri a 
2. group H) 



572 | X84036 



S.cerevisiae ARG8 
|andCDC33 senes 



0.19 



2444026 



0.19 



542433 



0.19 



3721862 



0.19 



1293816 



0.19 
0.19 



masquerade p recursor - fruit rly 
(Drosophila melanogaster) 
>gi|665545 (U18130) 
masquerade [Drosophila 
melanogaster] 
>gi| 1095942|prf|p 1 1Q286A 
[masquerade gene 



P VALUE 



,3880447 
3882041 



(U77783) N-methyl-D-aspartate 
receptor 2D subunit precursor 
[IHomo sapiens! 



1.8 



1.8 



225 K protein - Babesia bovis 
[(fragment) 



(AB016024) PQ2 [Plasmodium 
falciparum! 



(U56963) T13A10.5gene 
product [Caenorhabditis 
eleaansl 



1.8 



1.7 



1.4 



(AL032675) predicted using 
[ Genet'inder 

(AJ010405) hypothetical protein 



0.82 



0.62 



^ol 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank 1 ) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BtastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



Human WT) protein 



mucin - human >gi[501033 



573 



U57058 



DUO pre-mRNA, 
partial cds 



0.19 



631302 



574 



575 



AF034460 



U95098 



(U14383) mucin [Homo 
sapiens] 



Penicillium thomii 
internal transcribed 
spacer 1, 5.8S 
ribosomal RNA gene 
and internal 
transcribed spacer 2, 
complete sequence; 
and 28S ribosomal 
RNA gene, partial 
sequence 



Xenopus laevis 
mitotic 

phosphoprotein 44 
mRNA, partial cds 



0.19 



114136 



0.19 



AMINO-ACID 
ACETYLTRANSFERASE 
Pseudomonas aeruginosa 
>gi|15l036 (M38358) N- 
acetylglutamate synthase 
[Pseudomonas aeruginosa] 



105270 



alpha-2radrenergic receptor - 
human name 'ADRA2R' [Homo 
sapiens] 



0.60 



0.35 



0.27 



576 



AGO01475 



Homo sapiens 
genomic DNA, 2iq 
region, clone: 
125H6N2 



0.19 



94977 



hypothetical protein 3 - 
Pseudomonas sp. (DSM 6898) 
plasmid pKB740 >gi|45867 
(X66604) ORF3 



0.16 



577 



M63284 



vlouse IgG receptor 
^beta-Fc-gamma-RII) 
gene, exons 9 and 10, 
clones lambda- 
Fc(3.2,93). 



0.19 



302468 1 



TRANSCRIPTION 
NITIATION FACTOR TFIID 
35 KD SUBUNTT (TAFIM35)| 
TAFII135) (TAFI1-130) of 
*NA polymerase II transcription) 
'actor TFIID [Homo sapiens] 



578 



579 



U38241 



3 seudomonas 
aeruginosa orotate 
phophoribosyl 
transferase (pyrE), 
catabolite repression 
control protein (crc) 
and RNasePH (rph) 
genes, complete cds 



0.19 



AF039734 



.ontra lonaicaudis 
ransthyretin intron 1. 
martial sequence 



580 U95094 



Xenopus laevis XL- 
NCENP (XL- 
NCENP) mRNA. 

complete cds 



0.19 



0.19 



3044086 



322759 



(AF055904) unknown 
[Myxococcus xanthus] 



0.088 



0.052 



pistil extensin-like protein 
(clone pMGU) - common 
tobacco (fragment) >gi|19927 
(Z 1401 5) pistil extensin like 
protein [Nieotiana tabacum] 



0.030 



2147194 



collagen - Paralvinella erasslei 



0.002 I 



WO 01/02568 



PCT/US00/18374 



ACCESSION 



DESCRIPTION 



P VALUE 



IDrosophila 
melanogaster mRNA 
for DAD polypeptide, 
581 I AB0Q4232 complete cds 



582 | AF098919 



Gall us gallus alpha- 
globin gene domain 5' 
region 



Nearest Neighbor (BlastX vs. Non-Redundant ProteinTT 



ACCESSION 



DESCRIPTION 



2498765 



0.19 



1086863. 



P VALUE 



PEROXISOMAL MEMBRANE 
PROTEIN PEX16 lipolvtical 



(U41272) T03GU.6 gene 
product [Caenorhabdius 
elegant 1 



583 



Helicobacter pylori, 
strain J99 section 1 8 
of 132 of the 
AE001457 complete genome 



584 I L10329 



' 5 3 5 | AEOO ms- 



Plasmid RP4 tra£ 
gene, 3' end; traD 
gene, complete cds; 
trap gene, 5' end. 



0.19 



2924552 



Borrelia burgdorferi 
(section 41 of 70) of 
the complete "enurmr 



0.19 



38781 17 



586 | U49979 



587 | U88155 



588 I AF061854 



589 I M23865 



Orf virus El OR 
homolog gene, partial 
cds, and DNA 
polymerase gene, 
complete cds 



(AL022O18) 1- 

evidence=predicted by content; 
l-rnethod=genefmder ? 084; 1- 
method_score= 165.48; 1- 
evidence_end; 2- 
evidence=predicted by match; 2- 
match_accession=AA264666; 2- 
match_description=LD0835 1.5p 
ime LP Drosophila melanoga. 



(Z49068) mitochondrial carrier 
protein 



0.002 



4e-05 



3e-05 



0.19 



Xenopus laevis 
RanGTPase 
activating protein 



3850072 



0.19 



995714 



Schizosaccharomvces 
pombe Clr4p (clr4) 
ene, complete cds 
S.cerevisiae CHS2 
gene encoding chitin 
synthase. 



(U28739) similar to TPR 
domains in e.g. yeast STI1 
piuii i n [CaeMuilia L ditB cleaj nst 



8e-07 



(AL0333S5) dna-directed ma 
polymerase iii subunit 
[Schizosaccharomyces pombe] 



[X9125S) pid:e 198503 
Saccharomyces cerevisiae] 



0.19 



0.1S 



3242750 



<NONE> 



(AC005164) match to ESTs 
AA 73 1149 (NID:g2 140138). 
AA73190S (NID:g27527l9), 

AA2S7S37(NED:gi9335l9). 
AA262S11 (NTDrgl 898382), 
and AAS25820 (NIP: g2 8 99 132) 



-2c - 12 



le-15 



4e-16 



5e-I9 



<NONE> 



<NONE> 



Tflb 



WO 01/02568 



PCT/US00/18374 




SEQ 

ID f ACCESSION 



Nearest Neighbor (BlastN vs. Genbank) 



DESCRIPTION 



[Xenopus laevis XL- 



INCENP(XL- 
INCENP) mRNA, 
590 I U95094 complete cds 



591 



592 



593 



Caenorhabditis 
elegans cosmid 
AF067610 |F 41A4 

omo sapiens 
gonadotropin- 
releasing hormone 
precursor, second 
form (GnRH-II) gene, 
AF036329 Icompletecds 



0.18 



0.18 



H.sapiens 
mitoxantrone- 
I resistance associated 
Z49216 mRNA 



0.18 



594 



595 



X02167 



jTorulopsis glabra ta 
mitochondrial DNA 
for tRNA-Thr,-His 
land -Glu upstream of 
(cytochrome b uene 



0.18 



Z3156I 



IR.eommunis 
l(Carmencita) Scrl 
mRNA for sucrose 
carrier 



596 



LSI 692 



597 1 X57310 



598 



Homo sapiens 
(subclone 2_c9 from 
PI H56) DNA 
sequence 



0.18 



0.18 



Nocardia 

lactamdurans pcbAB 
and pcbC genes for 
alpha-aminoadipyl-L- 
cysteinyl-D-va!ine 
synthetase and 
isopenicillin N 
synthase 



0.18 



U13315 



Sus scrofa 
jaraihyroid receptor 
(PTH) mRNA, 
complete cds 



■ 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) '| 



O.IS 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



1346575 



55 KD ERYTHROCYTE 
MEMBRANE PROTEIN 



126404 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



8.4 



1022323 



SEED LIPOXYGENASE-2 (L- 
2) soybean >gi|170014 (J0321 1) 
lipoxygenase (EC 1.13.11.12) 



(X04647) collagen alpha-2(IV) 
chain [Mus musculus] | 3.8 



2o*| 



WO 01/02568 



PCT7US00/18374 



SEQ 
ID 



Nearest Neiehbor i BlnsrN vs . Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest [Neighbor fBlastX vs. Non-Redundam pw^-T 



ACCESSION 



599 



602 



Plasmodium 
J falciparum DNA *** 
SEQUENCING IN 
PROGRESS •** 
from contig 3-85, 
ALO 10158 [complete sequence 



DESCRIPTION 



0.18 



2506816 



Bos taums mRNA for 
thrombospondin 1, 

600 | AB005287 |complete cds 

Drosophila 
Imelanogaster cosmid 

601 | AL021108 clone 137E7 



0.18 



214 



Danio rerio Notch 
homologue 3 mRNA, 
U57975 complete cds 



0.18 



3483032 



0.IS 



85719 



603 I M30124 



604 I X54965 



605 | U95098 



aeruginosa 
autonomously 
replicating sequence. 



G.sp alpha 5HR DNA 



0.18 



3878017 



Xenopus iaevis 
mitotic 

phosphoprotein 44 
mRNA. partial cds 



606 I U20793 



Oryctolagus 
cuniculus renal 
sodium-dependent 
phosphate transporter 
type II mRNA. 
[complete cds. 



0.18 



134304 



0.1S 



1628403 



0.1S 



P VALUE 



PRECURSOR 
PROTEOGLYCAN CORE 
PROTEIN 2) (GLIAL 

HYALURONATE-BINDING 
PROTEIN) (GHAP) >gi|6085 15 
U16306) chondroitin sulfate 
proteoglycan versican V0 splice- 
varia nt precursor peptid e 
'2 b protein 

Mycobacterium tuberculosis 
tuberculosis] 

>gill694S63|gnl|PID|e283373 
(Z8301S) hypothetical protein 
Rv2968c [Mycobacterium 
tuberculosis] 



(AL03I37I) hypothetical 
protein SC4G2.06 
Streptomvces coelicolorl 



collagen alpha l'(H) chain 

(ALU2 1 381) similar to Zinc 
ringer, C4 type (two domains); 
cDNA EST yk452f4.5 comes 
from this gene;c DNA EST 
tMt3L:T00774 comes from this 
gene receptor NHR-3 
[Caenorhabditis elepns] 



STEM CELL PROTEIN 
chicken >gi|62845 (X63371) 
transforming capacity [Gal lus 
gallusl 



1705984 



(X9SS93) hTAFII68 [Homo 
sapiens] splicing [Homo 
sapiens] 



92 KD TYPE IV 
COLLAGENASE 
PRECURSOR IV, 92K, 
precursor - rat >gi| 1022784 
(U36476) 92-kDa type H/ 
collagenase [Rattus norvegicus] 



3.7 



2.9 



2.9 



1.7 



1.3 



1.3 



1.3 



\ 



1.2 



WO 01/02568 



PCT/US00/18374 



ED | ACCESSION 



607 



DESCRIPTION 



P VALUE 



U23427 



608 



U49953 



Human 

choiecystokinin type 
A receptor (CCFC-A) 
gene, exons 1 and 2. 



609 



610 



JOOI82 



Rattus norvegicus 
protein kinase MUK2 
mRNA, complete cds 



0.18 



Human alpha globin 
gene cluster on 
chromosome 16: zeta 
gene. 



X625I3 



61 i | X04862 



612 



M 12450 



M.gallopavo gene for 
metallothionein 



613 



AF038539 



Goat embryonic alpha 
globin gene zeta 
exo ns 2-3 

vitamin D 
binding protein 
(DBP) mRNA, 
complete cds. 



Mus musculus muscle 
NSP-like 1 (NspII) 
mRNA, complete cds 



614 



X78401 



615 I D38754 



Bacteriophage P22 
right operon, orf 48. 
replication genes 18 
and 12, nin resion 
genes, ninG 
phosphatase, late 
control gene 23, orf 
60, complete cds, late 
control region, start 
of lysis gene 13 



0.18 



0.18 



0.18 



Pig mRNA for inter- 
alpha-trypsin 
inhibitor heavy-chain 
~I1. complete cds 



ais 



o.is 



Nearest Hei^b or (BlastX vs. Non-Redundant Pro^ ins) 
ACCESSION 



DESCRIPTION 



3261734 



551238 



1585259 



2494740 



86837 



4210432 



3297877 



1123087 



1397275 



(Z94752) hypothetical protein 
Rv 1004c 



(XS1S47) pectate lyase 1 
[Erwinia caroto vora ] 



traJ gene [Amycolatopsis 
methanolica] 

HYPOTHETICAL 28 .3 KD 
PROTEIN IN GBD 5*REGION 
(ORF4) >gi|2120954|pir||I39562 
ORF4 - Alcaligenes eutrophus 
>gi[695274 (L36817) 0RF4 



androgen receptor B - human 



(AJ130783) APC2 protein [Mus 
musculus) 



(AJ224S6S) GNAS 1 [Homo 
sapiens! 



(U42436) C49H3.3gene 
product [Caenorhabditis 
elegans] 



0.43 



0.41 



0.31 



0.082 



0.038 



0.029 



0.009 



(U61947) C06G3.S gene 
product [Caenorhabditis 



elegans; 



7e-06 



WO 01/02568 



PCT/USOO/18374 



Nearest Neighbor (BlastN vs. Genbank] 



ACCESSION 



DESCRIPTION 



616 | X51508 



P VALOE 



Rabbit mRNA for 
aminopeptidase N 
(partial) 



617 I X54350 



S.kluyveri linear 
plasmid pSKL DNA 
for open reading 
I frames 1- 10 



618 



Human peripheral 
[benzodiazepine 
L2 1954 receptor gene, exon 4 



Uryctolagus 
cuniculus protein 
phosphatase 2A1 B 
gamma subunit 
[{skeletal muscle 



isolate) mRNA, 
619 1 U09355 complete cds. 



620 | X58715 



621 I AF060195 



622 I L27235 



T.cruzi hsp70 mRNA 
[for 70 kDa heat shock 
jrotein. partial cds 

Mus musculus 
proteasome regulator 
PA28 beta subunit 
gene, complete cds 



Methylobacterium 
extorquens serine 
cycle proteins 



ACCESSION 



DESCRIPTION 



0.18 



630864 



0.18 



3183405 



3925211 



0.1S 



3947877 



0.18 



30240S1 



0.1S 



861276 



0.IS 



2688949 



P VALUE 



(Drosophila melanogaster) 
>gi|4 15947 (X75760) LRR47 



PROTEIN C2C6.07IN 
CHROMOSOME I 

>gi|2370504|gnl(PID|e339194 
pom be] 

gi|3451305|gnl|PID|eI3 16730 
(AL031324) very hypothetical 
protein [Schizosaccharomyces 
lombel 



iWbj cUNA hVI 

EMBL:D70654 comes from this 
gene; cDNA EST 
EMBL:Z14359 comes from this 
gene; cDNA EST 
EMBL:D33409 comes from this 
gene; cDNA EST 
EMBL:D36239 comes from this 
gene; cDNA EST 
EMBL:Z 14766 comes from this 



6e-07 



gene. 



(AL0343S2) putative mitosis 
urul main t enance of ploidy 



protein [Schizosaccharomyces 

'ombe| 

MYOSIN LIGHT CHAIN 
KINASE, SMOOTH MUSCLE 
AND NON-MUSCLE 
ISOZYMES (MLCK) 
(CONTAINS: TELOKIN) 



4e-09 



8c- 11 



(U28739) similar to TPR 
domains in e.g. yeast STEl 
protein [Caenorh a bditis eleeansl 



(AF02720S) AC 133 antigen 



9e-l2 



le-14 



Ie-14 



WO 01/02568 



PCT/US00/18374 



SEQ 

ID f ACCESSION 



Nearest Neighbor fBlastN vs. Genbank) 



623 



624 



625 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant IW^T 



AF006573 



AFOOI782 



Drosophila virilis 
maltase 1 (Mavl) and 
maltase 2 (Mav2) 
genes, complete cds 



Staphylococcus 
aureus strain SA502A 
AerB 



P VALUE I ACCESSION 



626 



AJ223364 



J03059 



Homo sapiens germ- 
line DNA upstream of 
Jkappa locus 
Human 

glucocerebrosidase 
(GCB) gene, 
complete cds 



627 



628 



AB 008860 



Fugu rubripes Cal2 
gene for pheromone 
receptor, complete 
cds 



AF027174 cds 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 



629 1 AF059650 



0.17 



0.17 



0.17 



0.17 



Homo sapiens histone 
dcacetylase 3 
(HDAC3) gene, 
complete cds 



0.17 



0.17 



2500558 



<NONE> 



<NONE> 



<NONE> 



2198849 



539355 



482118 



DESCRIPTION 



P VALUE 



MUTATIVE RJDBONUCLEASE 
III (RNASE III) 

>gi|3876420|gnJ|PED|e 1346063 
(281070) similar to ribonuclease 
[Caenorhabditis elegans) 



<NONE> 



2e-23 



<NONE> 



<NONE> 



<NONE> 



kAHJU4yUUJ hikAKP IHomo ' 
sapiens] >gi|2665826 
(AF035771) Na+/H+- exchanger 
regulatory factor 2 [Homo 
sapiens] factor 2 [Homo 
sapiens] 

>gi|36 1 835 3|gni|PED|d 1 034 1 82 
exchanger isoform A3 [Homo 

[sapiens] 



<NONE> 



<NONE> 



7.8 



SCD25 protein (version 1) - 
veast 



7.5 



hypothetical protein C15H7.1 
Caenorhabditis elesans 



i 

: Vb% 



WO 01/02568 



PCT/US00/18374 




Nearest Neighbor fBlastN vs. GcnbanH 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (Blast* vs. Non.R^;^^^ 1 



ACCESSION 



DESCRIPTION 



630 



Homo sapiens his tone 
deacetylase 3 
(HDAC3) gene, 
AF059650 complete cds 



631 1 X55065 



632 | U15280 



Ch inese hamster 

[metailothionein II 
rene 

iRattus norvegicus 
oxytocin receptor 
(OTR) gene, exon 3 
and complete cds 
~u*u embryonic alpha 
globin gene zeta 

exons 2-3 
Plasmodium 

falciparum DNA *** 

SEQUENCING IN 
PROGRESS *** 
from contig 4-09, 

634 | ALO 10222 [complete sequence 

iH.sapiens mJWA for 

635 I X60111 MRP- 1 



0.17 



465932 



0.17 



3687237 



0.17 



633 | X04862 



542565 



0.17 



86837 



JUrt virus E10R 
homolog gene, partial 
cds, and DNA 
(polymerase sene, 
_636 I U49979 [complete cds 



0.17 



0.17 



1 177322 



3237306 



0.17 



3850072 



PROTEIN F58A4. II IN 
CHROMOSOME III 

|>gi|3874287|gnI|PID|el344088 
EST EMBL:C 12577 comes 
from this gene; cDNA EST 
yk227e7.5 comes from this 
gene; cDNA EST yk303dl.5 
comes from this gene; cDNA 
'EST yk3 14c 12.5 comes from 
this gene; cDNA ... 
EMBL:C 1 1886 comes from this 
gene;cDNA EST 
EMBLrC 12577 comes from this 
gene; cDNA EST yk227e7.5 
comes from this gene; cDNA 
EST yk303dl.5 comes from this 
gene; cDNA EST yk3 1 4c 12.5 



comes from this pene: cDNA 



4.4 



(AC005 169) putative Cys3His 
zinc -finger protein 



cyciinEtype II- fruit tly 
(Drosophila melanogaster) 
>gi|429l68 (X75027) 
>osophi!a cyclin E type II 
[Drosophila melanogasterl 



1.5 



androgen receptor B - human 



0.45 



(X95466) CPG2 protein [Rattus 
norvegicus] 

>gi|1588593|prfI|2208498A 
plasticity-related gene [Rattus 
norvegicus] 



0.080 



(U92715) breast cancer 
antiestrogen resistance 3 protein 



(AL033385) dna-directed rna 
polymerase iii subunit 
fSchizosaccharomyces pombel 



7e^07 



3e-09 



7c- 15 



WO 01/02568 PCT/USOO/18374 





i\ Nearest 


Neiehbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Protein <^ 


SEQ 
E> 


1 

|| ACCESSK> 


f DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


j 

P VALUE 
















637 


AF006573 


Drosophila virilis 
maltase I (Mavl) anc 
maltasc 2 (Mav2) 
genes, complete cds 


[ 

0.17 


1 2500558 


PUTATIVE RIBONUCLEASE 
III (RNASE m) 
>gi |3 876420|gnl(PID|e 1 346063 
(231070) similar to ribonuclease 
[uaenornaDciius etegansj 


2e-29 


638 


AE001141 


Borrelia burgdorferi 
(section 27 of 70) of 
the complete genome 


0.16 


I 

1850327 


(U52370) fertilin beta [Homo 
sapiens] 


2.3 


639 


M72980 


Anthonomus grandis 
vitellogenin sene 
(VTG), complete cds. 


- 

0.12 


3242750 


(AC0O5 1641 match to ESTs 
AA731149 (NID:g2I40138), 
AA73 1908 (NID:g27527l9), 
AA287837 (NID:gl9335l9), 

and AA825820 (NID:a2899132) 


2e-56 


640 


AF023532 


Simulium vittatum 
ATPase 6 gene, 
mitochondrial gene 
encoding 
mitochondrial 
protein, partial cds 


0.11 


<NONE> 


<NONE> 


<NONE> 




U76523 


SambucuLS nisra lectin 
precursor mRNA. 
complete cds 


0.10 


3482965 


(AL03 1 369) putative protein 


0.49 


642 


AJ001596 


Danio rerio mRNA 
for opioid receptor 
tiomoloeue 


0.099 


1706694 


1w/\JNV_/o I Ci\UL j Hi 1 flAjc, 

5.4.99.7) - fission yeast 
(Schizosaccharomyces pombe) 


2.3 


643 


< 

1 

U26341 


Oryctolagus 
cuniculus jNa and CI 
dependent betaine 
trans porter mRNA, 
romplete cds. 


0.099 


2645804 


[AF033381) betaine 
lomocysteine methyl transferase 
Mus musculus] 


0.59 


644 


MI 1633 


Bacteriophage Cp-5 
S. pneumoniae) 3' 
nverted terminal 
epeat. 


0.082 


2314695 


AE000649) type IIS restriction 
inzyme R and M protein 


4.3 


645 


< 

c 

X74I03 s 


Streptomyces sp. 
;ene for alkaline 
erine protease I 


0.073 


1314734 


U54641) 220 kDa silk protein 
Chironomus thummi] 


6.3 



\ 



1A® 



V 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor fBlastN vs. Ccnbanki 



ACCESSION 



646 



DESCRIPTION 



Z72509 



647 I X7Q282 



Laenorhabditis 



p value 



elegans cosmid 
F32G8. complete 
sequence 
[Caenorhabditis 
elegans] 



0.072 



X.laevis xanf-I gene 
■iuman DNA 



sequence from 
cosmid EI41E2, on 
chromosome 22, 
complete sequence 
[Homo sapiensl 

Drosophila virilis 
retrotransposon Tvl, 

649 | AF05 694Q comp lete sequent 

Homo sapiens 

650 | AJQQI 151 genomic seq uence 

Bacteriophage BF23 

651 ' X54455 gene 17 and gen e 18 

P.pinea internal 
transcribed spacers 1 

& 2 of ribosomal 

652 | X87936 DNA 



Dictyostelium 

discoideum TipD 

(tipD) gene, complete 
653 I AF019236 c ds 



O.cuniculus mRNA 
654 I X90592 lfo r pS3 protein 



IMus musculus 
putative T1/ST2 
receptor binding 
J protein precursor 
355 U4180 5 J mRNA, p artial rrig 



_656 1 AB00788 1 



657 I AL0102H 



Homo sapiens 
KIAA042I mRNA, 
artial cds 



lasmodium 
falciparum DNA *** 

Sequencing in 
progress *** 

from contig 3-109. 
complete sequence 



0.070 



0.069 



0.069 
0.068 
0.067 



0.067 



0.067 



0.067 



Nearest Neighbor fBlastX vs. Non-RedunH^p^ .;-^ 



0.067 



0.066 



0.066 



ACCESSION 



<NONE> 



3851202 



<NONE> 



2246532 
<NONE> 
<NONE> 



2459733 



3882275 



1703275 



DESCRIPTION 



642518 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



(AC005954) ZO-3 [Homo 
sapiens] fHomo sapiensl 



<NONE> 



0,40 



<NONE> 



(U93872) ORE 73, contains, 
jarge complex repeat CR 73 



<NONE> 



<NONE > 

(U95374) aldehyde 

dehydrogenase [Haloferax 
volcanii] 



<NONE> 



5e-I2 
<NONE> 



<NONE> 



(AB018320) KIAA0777 protein 
[Homo sapiensl 



METHIONINE 
AMINOPEPTEDASE 2 
(METAP 2) GLYCOPROTEIN) 
(P67) 



(U 1 7326) neuronal nitric oxide 
synthase [Homo sapiens) 



4.3 



i.I 



0.29 



0.29 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



t 
t 



Nearest Neighbor f BlastN v$. Genbank ) 



SEQ 

I ACCESSION 



DESCRIPTION 



658 



P VALUE 



Homo sapiens mRNA 
for KIAA0646 
AB014546 (protein, complete cds 



659 AF104I56 



Rattus ex ul arts isolate 
huahine30 
mitochondrial D- 
joop, partial sequence 



0.066 



Ncafest ^ehb or (BlastX vs. Non-Redundam tw S 

ACCESSION 



DESCRIPTION 



1082461 



660 X9758I 



661 I D85378 



M.musculus mRNA 
for spalt transcription 
factor 



0.066 



1002380 



662 M97561 



663 AE001373 



Human clone H20 N- 
acetylgJucosaminyltra 
nsferase III DNA, 
exon 2 



0.066 



4107313 



P VALUE 



homeotic protein HB9 - human 



(U241 89) RRM-rype RNA 
binding protein [Caenorhabditis 
elesans] 



(AL035075) mutative myosin 
heavy chain 



0.38 



0.29 



664 S75479 



665 AF032922 




Human (clone 

LA 179) chromosome 

21 sequence. 

^Plasmodium 

falciparum 

chromosome 2, 

section 10 of 73 of 

the complete 



sequence 

growth hormone 
receptor, growth 
hormone binding 
protein {GHR/BP 
gene) (mice, C57 
black/6. Genomic, 
179 nt. segment 8 of 



,666 | SS09S6 



Homo sapiens 
syntaxin 4 binding 
protein UNC-i8c 
(UNClScj mRNA, 
complete cds 



Svpl40J=svp-related 
nuclear 

receptor/retinoid 
signaling modulator 
[zebrafishes, mRNA. 
3S76 nt 1 



0.065 



0.065 



3061308 



(AB006074) topoisomerase III 
[Mus musculusl 



132628S 



(U5S734) weak similarity to 
ankyrin G [Caenorhabditis 
elegans] 



0.82 



0.12 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor fBlastN v S . Genbank; 



ACCESSION 



DESCRIPTION 



667 | X59552 



G.domesticus mRNA 
[for ventricular myosin 
[heavy chain 



Amhonomus grandis 
Jvitellosenin gene 
668 1 M7298Q Uvrr^ complete cds. 



669 



Homo sapiens mJRNA 
for KIAA0646 
AB014546 (protein, complete cds 



ISheeppox virus strain 
KS-1 ORFHMl 

[gene, panial cds; 
ORF HM2 and ORF 

IHM3 genes, complete 

kds; and ORF HM4 



?ene, partial cds 



671 1 268013 



672 I ARM 1332 



673 | J0045 1 



'aenorhabditis 
lelegans cosmid 
W02H3, complete 
sequence 

IfCaenorhabditis 
[elegans]^ 

Bodo saltans 
unknown mRNA, 
kinetoplast gene 
encoding kinetoplast 
protein, complete cds 



P VALUE 



Nearest Neighbor (BlastX vs. Nbn-R^nS^TpZq^ 
ACCESSION 



0.065 



2497098 



0.065 



3242750 



0.064 



<NONE> 



<NONE> 



0.064 



<NONE> 



Mouse germline IgG- 
3 chain gene, D-J-C 
region, and switch 
region. 



0.064 



<NONE> 



0.064 



<NONE; 



DESCRIPTION 



P VALUE 



JlNTERGENIC REGION 
>gill077l80[pirj|S49745 
probable membrane protein 
JYML034w - yeast 
(Saccharomyces cerevisiae) 
>gi|575685 (Z46659) unknown 
orfjen: 656, CAI: 0.13 
[Saccharomyces cerevisiael 

(AC005164) match to ESTs 
AA73U49 (NID:g2140138), 
AA731908 (NID:g27527l9), 
AA287837 (NIDrg 19335 19), 
AA262811 (NID:gl898382), 
and AA825820 (NID: | g 289913?l 



0.014 



5e-33 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION) DESCRIPTION [ p vauje ACCESSION 



674 U41289 



675 M37395 



676 215030 



Dicryostelium 
discoideum K7 
kinesin-Iike protein 
mRNA. complete cds 



L.iactis (strain SKI I) 
proteinase plasmid 
pSKlilDNA, 
complete cds. 



677 212021 



H.sapiens gene for 
ventricular myosin 
light chain 2 > :: 

gb|L01652|HUMVM 
LC Human 
ventricular myosin 
light chain 2 gene, 
seven exons. 



0.064 



G.max gene for 
catalase 



678 I L05668 



Entamoeba histolytica 
protein 

serine/threonine 
kinase (pstkl) gene. 
complete cds. 



0.064 



0.064 



0.064 



0.064 



Nearest Neighbor (BtastX vs. Non-Redundant Protei^T 



DESCRIPTION 



993019 



730343 



24987 LI 



733 140 



P VALUE 



348297 2 (AL031369) putative protein 



(X87246) alternative start codon 
[Pseudorabies virus 1 



PRECURSOR (PRL-R) mouse 
>gi|220576|gnI|PID|d 100 1 5 35 
D 102 14) prolactin receptor 
precursor [Mus musculus] 
>gi|293770 (L148 1 1) prolactin 
receptor [Mus musculus] 
>gt|347842(L13593) prolactin 
eceptor [Mus musculus] 
receptor: ISOTYPE=long form 
[Mus musculus] 



ORIGIN RECOGNITION 
COMPLEX PROTEIN, 
SUB UNIT 2 >gi| 1 1 8546 1 
(U3S472) essential ORC2- 
related fission replication factor 
Orp2 [Schizosaccharomyces 
pombej 



(U22453) carboxypeptidase 
[Simulium vittatum] 



9.2 



9.1 



5.3 



5.3 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor f BlasrN vs . Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlascX vs. Non-Redundant tw^ 



ACCESSION 



679 1 U50715 



680 | Z28182 



681 I M299I7 



_682 I ABQ20709 



Mus muse ul us alpha 
galactosidase A gene, 
complete cds 



S.cerevisiae 
chromosome XI 
reading frame ORF 
YKL182w 



Human ornithine 
aminotransferase 



>ene. exon I. 



-lomo sapiens mRNA 



0.064 



0.064 



0.064 



683 I ABO 1 4546 



684 I ABO 10427 



685 



U34774 



or KIAA0902 
protein, complete cds 



Homo sapiens mRNA 
for KIAA0646 
rotein. complete cds 
Homo sapiens mRNA 
for NORM, complete 
cds 



0.064 



Orf virus ankyrin-Iike 
repeat protein. F11L 
homoloe, and F12L 
lomolog genes. 
complete cds. 



0.064 



0.064 



686 I AF022861 



Mus musculus 
neuropilin-2(ai) 
mRNA. alternative I v 
spliced, complete cds 



0.064 



0.064 



125398 



1079035 



2317934 



861404 



1708118 



2388676 



731668 



4091978 



1A6 



DESCRIPTION 



P VALUE! 



H YGROM YC IN-B KINASE 
(HYGROMYCIN B 

PHOSPHOTRANSFERASE) 
(APH(7")) 

>gi|66885|pir||WGSMHH 
hygromycin B 

phosphotransferase (EC 2.7.1.-) 
Streptomyces hygroscopicus 
>gi|581682 (X03615) pot. hyg 
protein [Streptomyces 
hygroscopicus] 
phosphotransferase [synthetic 
construct] >gi|2739064 cloning 
vector] >gi|2739068 
(AF025747) hygromycin B 
phosphotransferase [unidentifiedl 
cloning vector] 
Omf2D) protein - fruit fly 
(Drosophila ananassae) 

>gi|443770|gnl|PlD|d 1006095 
fD26553)ORF 



2.3 



1.8 



(U97553) unknown [murine 
[herpesvirus 68] 



product [Caenorhabditis 
[eleuans] 



HOMEOBOX PROTEIN HB9 
>gi|507425 • 

(AF015539) precollagen P 
[fMytilus edulis] 



SSF1 PROTEIN 
>2t|626624|pir||S46700 SSFl 
protein - yeast (Saccharomyces 
cerevisiac) 



1.4 



0.47 



0.35 



0.018 



le-05 



(AF073359) benzaldehyde 
dehydrogenase [Pseudomonas 
sp. DJ77] 



le-05 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor ( Bias tN vs. Genbank) 



ACCESSION 



DESCRIPTION 



687 



U14331 



P VALUE 



Nearest Neighbor (BlastX vs. Non-R e d„nH ,n, p^" 



ACCESSION 



ins) 



DESCRIPTION 



Sus scrofa myogenin 
gene, complete cds 



AF074870 



Chironomus 
pallidivittatus clone 
12 19 non-telomeric 
Ssp repeat sequence 



0.064 



2781386 



0.063 



sapiens repeat 
,225523 I r egion DNA. 

asmodium 
falciparum 
chromosome 2, 
section 15 of 73 of 
the complete 
sequence 



AE001378 



S.cerevistae 
chromosome VII 
(reading frame ORF 
272947 YGR162w 



Y 14723 



Choanomphalus 
incertus 
mitochondrial 
cytochrome c oxidase 
[subunit I sene. partial 



693 



X74103 



Streptomyces sp. 
gene for alkaline 
serine protease I 



694 



AF039843 



Homo sapiens 
Sprouty 2 (SPRY2) 
mRNA. complete cds 



<NONE> 



0.063 



<NONE> 



(AL00401Q) similar to Leuci ne^ 
rich transmembrane proteins; 
44% similarity to U42767 
(PID:gI736918) [Homo 
sapiens] 



P VALUE 



== 



3e-33 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



0.063 



<NONE> 



<NONE> 



<NONE> 



0.063 



<NONE> 



<NONE> 



<NONE> 



0.06: 



<NONE> 



0.063 



1730713 



<NONE> 

PROTEIN IN UME3-PUB i 
INTERGENIC REGION 
>gi|2I3I866|pir||S62935 
hypothetical protein YNL023c 
yeast (Saccharomyces 
cerevisiae) 

>gi| 1 30 1 855|gnI|PID|e239870 
(Z71299) ORF YNL023c 

[Sacchar omyces cerevisiae! 
' TTC 



<NONE> 



0.063 



232217 



"CiLLlAlHlUNh'S 
TRANSFERASE GST^6.0 
(GSTBi-l) 
>gi|421 l9S|pir|jS29772 
glutathione transferase (EC 
2.5. 1.18)- Proteus mirabilis 
>gi|2126I42|pir||S718S2 
glutathione transferase (EC 
2.5.1.18) B- Proteus mirabiiis 
>gij 1053076 (U38482) 



6.7 



Hip 



WO 01/02568 PCT/USOO/18374 



SEQ 
ID 



Nearest Neighbor fBlastN vs. Genbanki 



ACCESSION 



DESCRIPTION 




Mouse M-twisi gene 
695 J M63 650 mRNA. complete cds 



696 I Y 13298 



Homo sapiens GDP 
dissociation inhibitor 
[beta pscudogene 



697 1 X56600 



RatSOD-2 gene for 

manganese- 
jcontaining superoxide 
[dismutase 



0.06: 



0.063 



M.musculus mRNA 

for 5HTx serotonin 

698 I 223107 recep tor 

(Plasmodium viva* 

Icircumsporozoite 

699 I M206 70 protein gene. Vend. 



0.063 



0.063 



0.063 



,700 1 Z62997 



H.sapiens CpG DNA, 
clone 76gl 1, reverse 
read cpg76gl l.rtla 



701 1 U95094 



Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA, 
complete cds 



0.06; 



702 | U95098 



IXenopus laevis 
mitotic 

phosphoprotein 44 
kiRNA. paaial cds 



0.063 



703 I U95094 



IXenopus laevis XL 
INCENP(XL- 
[INCENP) mRNA. 
[complete cds 



0.063 



0.063 



DESCRIPTION 



1730141 



1085930 



3882143 



1708162 



4033395 



1350911 



298 1200 



3877951 



P VALUE 



339301S 



RETARDATION SYNDROME 
RELATED PROTEIN 2 
>gi|2l35l29|pirj|S60l73 fragile 
X mental retardation syndrome 
related protein - human 
>gi|1098637 (U31501) fragile X 
mental retardation syndrome 
related protein IHomo sapiensl 



hypothetical protein 4 - fowl 
adenovirus 1 



1.3 



1.3 



(AB01S254) KIAA071 1 protein 
[Homo sapiens? 



HUNT1NGTIN 
(HUNTINGTON'S DISEASE 
PROTEIN HOMOLOG) (HD 
PROTEIN) 



DNA GYRASE SUB UNIT B 

subunit [Mvxoco ccus xanthus] 
KfcllMJlL AllD RLLbPlUk 
RXR-BETA sapiens) 
>gi!3 172498 f AFOrt^Q^ 
retinoic X receptor B 
dJI033Bl0. 1 1 (Retinoid X 
receptor beta (RXRB)) [Homo 
sapiens] >si|4249766 

(AF 120 161) retinoic X receptor 
beta 



0.60 



0.45 



0.35 



(AF04S732) cyclin T2b [Homo 
sapiens] 



(ZS1 555) predicted using 
3ene finder 



f AL031 1741 hypothetical 
protein 



0.16 



0.090 



6e-0: 



2e-l0 



WO 01/02568 



PCT/US00/18374 



_Nearest Neighbor f B lauN v S . Genbank) 

seq" 

id i accession 




DESCRIPTION P VALUE 



704 I D90872 



705 M25528 



706 



Kohara clone 
#419(54.7-35.1 min.) 



M.crystallinum 
ferredox in-N ADP+ 
reductase (fhrA) 
mRNA, complete cds 



0.063 



Stiongyloides ratti 
U45256 Imicrosateliite B DNA 



707 U95I02 



Xenopus laevis 
mitotic 

Iphosphoprotein 90 
ImRNA. complete cds 



708 



iHomo sapiens 
TEUAML1 fusion 
AF044317 [gene, partial sequence 
"'"aenorhabditis 



0.062 



0.062 



0.062 



709 Z73975 



710 



711 



X54232 



X03073 



712 Y12573 



713 



elegans cosmid 
T06ES, complete 
sequence 
[Caenorhabditis 
elegans] 



Human mRNA for 
heparan sulfate 
proteaglycan 



0.062 



Bovine retinal mRNA 
br transducin beta- 
subunit 



D.melanogaster Jun 
and 14-3*3 zeta sene 



L26573 



3ombus terrestris 
mitochondrial 
cytochrome oxidase L 
partial cds. 



0.062 



0.062 



0.062 



0.062 



2498198 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



3108187 



1076741 



477578 



387955 1 



1684959 



CYTOCHROME B561 
(CYTOCHROME B-56H 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(AC004663) Notch 3 [Homo 
sapiensl 



chitinase (EC 3.2.1.14) 
precursor - rice precursor - rice 
>gi|S07955 (X87109) chitinase 
fOryza sativa] 



siaiidase - Actinomyces viscosus 
>gi|141852 



3e-19 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



2.9 



(Z70756) similar to collagen 



(U20600) NADH 
dehydrogenase subunit [Vanda 
larnellata] 



0.59 



0.087 



0.07; 



0.039 



i 



i 



WO 01/02568 



PCT/US00/18374 




SEQ 
ID 



Nearest Neighbor fBlastN vs. Genbanfci 



ACCESSION 



DESCRIPTION 



P VALUE 



714 I U58994 



715 | AB014551 



Human ladinin (LAD) 
;ene, complete cds 



tforho sapiens mRNA 
for KIAA0653 
protein, partial cds 



0.062 



716 I L 16898 



717 I X99343 



0.062' 



Mus musculus 
collagen alpha 1 type 
XVHI mRNA. 5end. 



Vl.tuberculosis 
guaA/B & choD 
genes 



0.062 



0.062 



718 1 AF010193 



719 I L10182 



Homo sapiens MAD 
related gene SMAD7 
(SMAD7) mRNA, 
complete cds 



Myrmeieonsp. 18S 
nbosomal RNA. 



0.061 



0.061 



720 I Y 14723 



Choanomphalus 
incertus 
mitochondrial 
cytochrome c oxidase 
subunit I oene. partial 



Bovine respiratory 
syncytial virus 
nucleoprotein mRNA. 
121 I L27840 [complete cds. 



0.061 



0.061 



Rarest Neighbor ( BlasLX vs. Non-Redundant Protein^ 



ACCESSION 



2811078 



1326350 



1723657 



3873807 



<NONE> 



<NONE> 



1 I <NONE> 



542955 



DESCRIPTIO N 



P VALUE 



AMINOPEPTIDASE) 
(ARGINTNE 

AMINOPEPTIDASE) 
(CYTOSOL 

AMINOPEPTIDASE IV) (AP- 
B)>gi|2039I43 (U61696) 
aminopeptidase B [Rartus 
norvegicusl 

U58748) similar to potential 
transmembrane domains in S. 
cerevisiae nulcear division 
RFTl^protein (SP:P38206 > 



9e-06 



li^LflHfcHLAL JK.bKD 
PROTEIN IN ERV1-GLS2 
INTER GENIC REGION 
>^f2I32587|pir||S64322 
probable membrane protein 
YGR03 1 w - yeast 
(Saccharomyces cerevisiae) 

>gi|13230I0|gn!|PID|e243277 
(272816) ORF YGR03lw 

fSaccharomyces cerevisiae 1 



5e-10 



(249907) B049 1.1 
[Caenorhabditis elegansl 



<NONE> 



<NONE> 



le-14 



2e-19 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



nucleoporin p62 - human 



8.6 



WO 01/02568 



PCT/US00/18374 



2B^L Nearest Neighbor (BlastN vs. Genbank: 



SEQ 

idJIaccession 



DESCRIPTION P VALUE 



722 1 U95094 



723 U95098 



Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA, 
complete cds 



Xenopus laevis 
mitotic 

phosphoprotein 44 
mRNA, partial cds 



724 U26463 



Sporidiobolus 

salmonicolor 

NADPH-dependent 
laldehyde reductase 
[gene, complete cds 



725 



AF035443 



726 1 Z48584 



Xenopus laevis weel 
homolog mRNA, 
complete cds 



Caenorhabditis 
elegans cosmid 
ZK 132 i, complete 
sequence 

[Caenorhabditis 
elegans] 



Nearest Neighbor f fllastX vs. Non-Redundanr ^n^cT 



ACCESSION 



DESCRIPTION 



P VALUE 



>us 



0.061 



494454 



0.061 



3845272 



scrofa Sus scrofa 

gi|14212I0|pdb|lPCP| Porcine 
Spasmolytic Protein (Psp) (Nmr, 
19 Structures) Spasmolytic 
Polypeptide 

>gi|1633061|pdb|2PSP|B Chain 
B, Porcine Pancreatic 
Spasmolytic Polypeptide 



(AE001417) hypothetical 
protein [Plasmodium 
[falciparum I 



1710288 



0.061 



3979720 



(U79302) unknown [Homo 
sapiens ] 

EMBL:D33048 comes from this 
gene; cDNA EST 

EMBL:D35780 comes from this 
gene; cDNA EST yk442c6.3 
comes from this gene; cDNA 
EST yk442c6.5 comes from this 
gene; cDNA EST yk398f6.3 
corner from this gene; cDNA 
E... 

>gip9798l6|gnl|PID|el358315 
ESTEMBL:D35780 comes 
from this gene; cDNA EST 
yk442c6.3 comes from this 
gene; cDNA EST yk442c6.5 
comes from this gene; cDNA 
EST yk398f6.3 comes from this 
eene;cDNA E... 



0.44 



0.061 



3183491 



HiPUlHfcllc'AL ilSKD 

PROTEIN C27F2.7 IN 
CHROMOSOME III 
>gi| 10655 1 0 (U404 19) C27F2.7 
gene product [Caenorhabditis 
elegans] 



2e-04 



3c- 11 



1* 



WO 01/02568 



PCT/US00/18374 




Nearest Neighbor tBIacrNf Cenbank) 



DESCRIPTION 



P VALUE 



727 1 X6I489 



Zea mays pep gene 
for (C3 type) 

phosphoenolpyruvate 
carboxylase 



Drosophila 

melanogaster 

Windbeutel (wind) 

tene, complete cds 

IBrassica rapa mRNA 

for SRK45. complete 
729 ABQ12106 He 



J28JAF025408 



730 



JArabidopsis thai i ana 
cellulose synthase 
jcatalyiic subunit (Ath- 
|B) mRNA, complete 
AF027174 cds 



0.061 



0.061 



0.060 



731 



H.sapiens mRNA for 
carnitine 

palmitoyltransferase I 
Y08682 Itvp e I 



732 



IXenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA. 
U95094 complete cds 



0.060 



0,060 



733 I AF064030 



Helianthus tuberosus 
lectin 2 mRNA. 



complete cds 



734 I AF 100694 



735 | U95102 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Xenopus laevis 
mitotic 

phosphoprotein 90 
mRNA. complete cds 



0.060 



0.060 



Nearest Neighbor f blasiX vs. Non-Red..^, 



ACCESSION 



0.060 



736 1 U76523 



Sambucus nigra lectin 
precursor mRNA. 
complete cds 



0.060 



737 j LT6966S 



Human nuclear pore 
complex-associated 
protein TPR 



0.060 



0.060 



2496887 



3702295 



<NONE> 



<NONE> 



3319446 



1041119 



632209 



3098348 



125978 



2055394 



4127854 



DESC RIPTION 
HtHUIHiillLAi J2.QKD 



p value! 



PRU1E1N C09F5.2IN 
CHROMOSOME HI 
>gi|732538 (U22832) C09F5.2 
gene product [Caenorhabditis 
elegansl 



(AC0057S3) R33083.1 [Homo 
(sapiens] 



<NONE> 



le-15 



2e-60 



<NONE> 



<NONE> 



(AF077541) contains similarity 
to class-I aminoacyl-tRNA 
synthetases [Caenorhabditis 
elegans] 



(D78016) TRAE [Enterococcus 
faecal is) 

regulatory protein Rex - primate 
T-lymphotropic virus PTLV-L 
(fragment) 



8.1 



(AF037401) neuropeptide 
Y/peptide YY receptor Yc 
[Danio rerio] 

~ ItiN PkkCURSOR 
(LEUKOCYTE ANTIGEN 
RELATED) 

>gi|70146|pir||TDHULK 
leukocyte antigen- related 
protein precursor - human 
>ai|34267 sapiensl 



8.1 



3.7 



2.1 



(U87306) transmembrane 
receptor UNC5H2 [Rattus 
norvegicus] 

f V 14063) ChTl thymocyte 
antigen [Gallus sallusl 

~ " — w 



1.2 



0.32 



9e-04 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Cenbank] 



ACCESSION 



738J, ABO 14553 



739 U95098 



DESCRIPTION I p VALUE ACCESSION 



Homo sapiens. rrtRN A 
for KIAA0653 
protein, partial cds 



Xenopus laevis 
mitotic 

phosphoprotein 44 
mRNA, partial cds 



740 296260 



H.sapiens teiomcric 
DNA sequence, clone 
I2QTEL101,read 
_12QTELOO101.seq 



0.060 



0.060 



741 M93128 



742 ABO 12 106 



743 f AB007920 



744 M60445 



Mouse homeobox 
protein (EVX2) 
mRNA, complete cds 



Brass ica rapa mRNA 
forSRK45, complete 
cds 



Homo sapiens mRNA 
for KIAA045 1 
protein, complete cds 



0.059 



0.059 



0.059 



0.059 



Human histidine 
decarboxylase (HDC) 
mRNA. complete cds 



0.058 



745 U01836 



746 AF090115 



747 



AB012105 



Ustilago maydis 

exodeoxyribonucleas 
e(RECl) gene. 
complete cds. 



0.058 



Lycopersicon 
esculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSP17.4) mRNA. 

complete cds 

3rassica rapa mRNA 
or SLG45, complete 
cds 



0.058 



0.057 



Nearest Neighbor i BlastX vs. Non-Redundant fW 



DESCRIPTION 



1326350 



2632098 



<NONE> 



<NONE> 



transmembrane domains in S. 
cerevisiae nulcear division 
RFT1 protein (SP:P3S206) 



(Y15513)Prodos protein 
[Drosophila melanogaster] 



ie-09 



<NONE> 



I6523I8 



<NONE> 



(D90904) iysostaphin 
rSynechocystis sp.] 



47949 1 



transcription factor brn-3b 
human 



<NONE> 



<NONE> 



1171908 



TRANSPORT SYSTEM 
PERMEASE PROTEIN OPPC 
>gi|1075086|pir||D64184 
oligopeptide transport system 
permease protein (oppC)C 
homolog - Haemophilus 
(influenzae (strain Rd KW20) 
permease protein (oppC) 
[Haemophilus influenzae Rd] 



3193265 



f AF069131 ) chitinase [Bacillus 
subtilis] 



433385 



(CJ03973) dynein heavy chain 
tsotype 7A [Tripneustes gratilla] 



<NONE> 



<NONE> 



4.7 



0.71 



<NONE> 



1.5 



\ 

o.oo: 



3.4 



WO 01/02568 



PCT/US00/18374 



NeafeSt Neighbor ( BlasiN vs. Genbantr ) 



SEQ 

g> 1 accession I DESCRIPTION 



Arabidopsis thuliana 



P VALUE 



mRNA for 

Jneoxanthin cleavage 

748 | AJ0058n 

Lagopus lagopus 
genomic 
Imicrosatellite 

749 I Y 16828 sequence LLST4 



Nearest Netphbor t BlastX vs. Non-Redundant Protein^ 



0.056 



0.056 



Sambucus nigra 
ribosome inactivating 
Iprotein precursor 
750 AF01289 9 mRNA. comp line 



Sambucus nisra 
he vein- like protein 
751 J AF074 385 mRNA. complete cds 



0.055 



152 I U76523 



M92069 



S78; 



Sambucus nisra lectin 
precursor mRNA, 
complete cds 

Human retrovirus-Iike 
sequence -isoleucine c 



0.055 



GlL=ankyrin-like 
repeat (orf virus OV. 
NZ2, Genomic. 1608 



0.035 



0.034 



M 15646 



AF027174 



Chicken myosin 
alkali light chain 
mRNA, complete cds, 
clone pFl. 



Arabidopsis th:iliana 
cellulose synthase 
catalytic subunit { Ath- 
B) mRNA. complete 
cds 



0^027 



1 



0.025 



ACCESSION 



<NONE> 



3328678 



<NONE> 



137339 



<NONE> 



3334221 



DESCRIPTION 



3877815 



P VALUE 



<NONE> 



(AE001299) hypothetical 
protein [Chlamydia trachomatis! 



<NONE> 



4.3 



<NONE> 



69 fCD PROTEIN 
gi|94375|pir||S 19150 
hypothetical protein, 69K - 
turnip yellow mosaic virus 



<NONE> 



0.69 



<NONE> 



<NONE> 

(AF043700) contains similarity 
to human RNA-binding protein 
FUS/TLS (SW:Q2S009) 



HYDROXYPHENYLPYRUVA 
TE D [OXYGENASE 4- 

hydroxyphenylpyruvate 
d {oxygenase [Mycosphaerella 
graminicola] 



<NONE> 



<NONE> 



(296048) predicted usini 
Genefinder 



6e-17 



5.0 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlasiN vs. Gen ban k ) 



ACCESSION 



DESCRIPTION 



P VALUE 



. Nearest Neighbor (BlastX vs. Non-Redundant Proteins'" 



ACCESSION 



DESCRIPTION 



757 AJ002291 



P VALUE 



758 J X79104 



759| U95102 



Streptococcus 
pneumoniae pop lb 
gene, complete 



0.025 



38804S7 



Cbotulinum (NCTC 
7272 type A) HA-33 
andP-21 eenes. 



phosphate pyrophosphokinase; 
cDNA ESTEMBL:D73173 
comes from this gene; cDNA 
EST EMBL:D70909 comes 
from this gene; cDNA EST 
EMBL:D73449 comes from this 
gene; cDNA EST 

EMBL:D76167 comes from this 




n e 



0.024 



2648615 



(AE000970) tungsten 
formy 1 m eth a no furan 
dehydrogenase, subunit B (fwdB 
2) f Archaeoglobus rulgidus] 



1.7 



Xenopus laevis 
mitotic 

phosphoprotein 90 
mRNA. complete cds 



0.024 



760 | U36197 



761 



Chlamydomonas 
reinhardtii cobalamin 
independent 
methionine synthase 
mRNA. complete cds 



1663698 



L38865 



762 



AF035948 



{ 763 



X98890 



Macaca mulatta 
(clone MM V A 63) T- 
ceil receptor alpha 
CTCR A) mRNA. 
partial cds. 



0.024 



(DS3785) expressed 
ubiquitously; product similar to 
D.melano^aster mam protein. 
(Homo sapiens! 



585723 



0.02: 



PEROXISOME 

PROLIFERATOR 

ACTIVATED RECEPTOR 

GAMMA (PPAR-GAMMA) 
>gi|283SlS|pirj|C42214 
peroxisome proliferator- 
activated receptor gamma chain 
African clawed frog >gi|2 14668 
(MS4I63) peroxisome 
prot iterator activated receptor 
gamma [Xenopus laevisl 



6.1 



4.7 



<NONE> 



Mus musculus insulin 
receptor substrate-j 



S.tuberosum mRNA 
or inorsanic 
phosphate 
transporter, SlPTI 



0.023 



0.023 



<NONE> 



2500587 



SPLICEOSOME 
ASSOCIATED PROTEIN 49 
splieeosome- associated protein 
SAP-49 - human >gi|556217 



0.42 



<NONE> 



1 10072 



proline-rich protein MP4 - 
mouse >'ei|53 1S2 



0.40 



0.IS 



WO 01/02568 



PCT7US00/18374 



SEQ 
ID 



Nearest Neighbor fBlastM v S . Cenbank") 



ACCESSION 



DESCRIPTION 



Nearest Neighbor f BlascX vs. Non-Reri„mi,n, p^:-^ 
P VALUE I ACCVWnM 



764 



L.esculentum mRNA 



0.022 



Homo sapiens 
chromosome 5. PI 
clone 1209CI (LBNL 
Hi 04), complete 
sequence [Homo 

765 j AC004498 IsapiensI 

Human prostatic acid 
Jphosphatase (ACPP) 
U07083 gene, exon 1 

S. tuberosum mRNA 
for inorganic 
phosphate 

767 1 X98890 transporter. St PT1 0.022 



766 



DESCRIPTION 



0.022 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



0.022 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



768 I X56488 



L.esculentum LAT59 
gene 5'flanking 
region, expressed 
during pollen 
maturation 



<NONE> 



0.022 



Pseudorabies virus 
with upstream and 
downsteam 
769 1 M34651 sequences. 



<NONE> 



0.022 



770 X66727 



P.taeda gene tor 
protochlorophyllide 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(292S06)KI0G4.4 



[Caenorhabditis elegansl 



Xenopus laevis 
mitotic 

Iphosphoprotein 90 

771 1 U95102 mRNA. comple te cds 

Xenopus laevis 
mitotic 

Iphosphoprotein 44 

772 J U950 98 mRNA. partial cds 



0.022 



0.022 



7731x71932 



774 | X87369 



H. sapiens XB yene 
for tenascin-X. in iron 
14 



0.022 



C.perfringens nanH 
gene & ORF1.2.3 & 4 



1854452 



(D89501) similar to salivary 
proline-rich protein P-B [Homo 
sapiens] 



4.3 



3021699 



AB005298)BAI2[Homo 
sapiens] 



627059 



0.022 



liver stage antigen LSA-I - 
Plasmodium falciparum 
>gi|99I6(X56203) liver stage 
antisen 



(U78975) poIy(ADP-ribose) 
2062407 lalyeohvdrolase [Bos taurusl 



0.058 



0.056 



WO 01/02568 



PCT/US00/18374 



Nearest Ne.ehbor i BlastN vs. Genbank) I Nearest Nei^^X vs. Non-Redundan, PwJ S^I 1 



ACCESSION 



DESCRIPTION 



P VALUE 



775 



776 



777 



778 



779 



782 U95094 



783 



GaJlus gallus mRNA 
Y14971 for K60 protein 



0.022 



Caenorhabditis 
lelegans cosmid 
AF0O3133 T2IE3 



Caenorhabditis 
lelegans cosmid 
AF003I33 T21E3 



[Human helix-loop- 
helix proteins Td-l 
(ID-I)andld-r(ID- 
1) genes, complete 
U57645 Icds 



0.022 



0.022 



|Methanococcus 
jannaschii section 112 
Jof 150 of the 
U67570 [complete genome 



0.021 



Trypanosoma cruzi 
calcium-binding 
protein (CUB2.S) 
L0I584 Igene, complete cds, 0,021^ 



0.021 



781 L04787 



Borrelia herrnsii outer 
membrane lipoprotein 



Xenopus laevis XL 
DCENP(XL- 
DVCENP) mRNA. 
[complete cds 



0.021 



L36890 



Saccharomyces 
cerevisiae 
mitochondrion 
transfer RNA-Thrl 
(tRNA-Thr) gene; 
transfer RNA-Val 
(tRNA-Val) <:ene; 
pxi2 gene, complete 
cds;ORF2 and origin 
|of replication (ori5), 



0.021 



0.021 



ACCESSION 



DESCRIPTION 



134091 



1709997 



1709997 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



UliMALLNbCLbAR 

mum 



KD(U1SNRNP70KD) 
>gi[85864|pir||S020l6Ul 
snRNP 70K protein - African 
clawed frog >gi|65179 
(X12430)U1 70K [Xenopus 
laevis] 



P VALUE 



DNA REPAIR PROTEIN 
RAD 18 >gi|I 150622 protein 
radl8 [Schizosaccharomyces 
3ombe] 



0.032 



DNA REPAIR PROTEIN 
RADl8>gi)i 150622 protein 
radI8 [Schizosaccharomyces 
pom be] 



2e-08 



2e-08 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<N'ONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



. ^^l^bur 

(BlasiN vs. Genbank) 



ACCESSION 



DESCRIPTION 



784 | M76741 



P VALUE 



785 I M87504 



786 | U94346 



787 | L01584 



Homo sapiens biliary 
glycoprotein (BGP) 
gene, partial cds 




Tetrahymena 
thermophila histone 
H3 (HHT2) gene. 
complete cds 

Human calpain-Iike 
protease (htra-3) 
nRNA. complete cds 



Nearest center (BlastX vs. Non-Redu-W p*.„:„J 



ACCESSION 



0.021 



0.021 



788 | U36530 



Trypanosoma cruzi 
calcium-bindins 
protein (CUB2.S) 
ene. complete cds. 
ongo pygmaeus L I 

microsatellite. clone 
#1, from the tandem ly 
repeated genes 
'encoding U2 small 
nuclear RNA (RNU2 
locus) 



0.021 



0.021 



789 



[Human gene tor 
linterleukin 1 alpha 
_X03833 QL-1 alpha) 



0.021 



790 I U20806 



Dictyostelium 
discoideum guanine 
nucleotide-binding 
protein alpha subunit 
5 (G alpha 5) sene, 
complete cds. 



0.021 



791 I 259258 



792 



793 



H. sapiens CpG DN.A 

clone I3d2, reverse 

readcpgl3d2.rtlc . 
IHasmodium 

falciparum strain 7GS 

Ichloroquine 

[resistance candidate 

[protein (cg2) gene. 

AF030692 complete cds " 

(Methanococcus 

jannaschii section 112 

Jof 150 of the 

U67570 compl ete senome 



0.021 



0.021 



0.021 



0.021 



DESCRIPTION 



<NONE> 



<NONE> 



* <NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



416974 



EARLY TRANSCRIPTION 
I FACTOR 70 KD SUBUNIT 



1401211 



3121732 



3024190 



(U585I0)RNAheiicase 
homolog [Chlorarachnion 

CCMP6211 

AL UN IT ATE HYDRATASE 
(CITRATE HYDRO-LYASE) 
(ACONITASE) >gi|2 183256 
[ AF002 1 33) aconitase 
Mycobacteriu m avium] 

NJNfc PROtfilN 

gi|2 12025 l,pir||S66581 
hypothetical protein 56 - phage 
S2 >gi|1051U4(X92588) 
orf56; related to nin60 (ninE) of 
bacteriophage lambda 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



8.9 



8.8 



7.0 



2341037 



(AC000104) F19P19.17 
[Arabidopsis thalianaj 



5.8 



4.0 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

roiACCESSION 



D86566 



795 | LI 1648 



Human DNA for 
NOTCH4, partial cds 



796 U95094 



Streptomyces 
coelicolor sigma 
actor (rpoX) gene, 
complete cds. 



Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA. 
complete cds 



Rattus norvegicus 
microtubule- 
797 | U3 Q938 [associated protein 2 



798 D82364 



Chicken mRNA for 
TSC-22 variant, 
complete cds, clone 
SLFEST52 



799 U4004 1 



800 X71932 



Gallus gallus eHAND 
mRNA. complete cds J 0.02 1 



H.sapiens XB ^ene 
for tenascin-X, intron 
14 



801 AP042333 



802 I L37380 



Caenorhabditis 
elegans cosmid 
803 1 AF003133 [T21E3 



Oryza sativa 24- 
methylene lophenol 
C24(l)methyltransfer 
ase mRNA. complete 
cds 



Rat apical endosomal 
glycoprotein mRNA. 
complete cds. 



0.021 



0,021 



hypothetical 119.5K protein 
(uvrA region) - Micrococcus 
lute us 



NfcUkOfcNDOCRINE 

CONVERTASE 1 
PRECURSOR (NEC 1) (PCI) 
(PROHORMONE 
CONVERTASE 1) propeptide 
processing protease [Mus 
cookii] 



627059 



liver stage antigen LSA-1 - 
Plasmodium falciparum 
>gi|9916(X56203) liver stage 
antisen 



0.021 



3334377 



0.021 



1709997 



(X83413)US8 [Human 
herpesvirus 6] 




TRANSMEMBRANE 
PROTEASE, SERINE 2 



DNA REPAIR PROTEIN 
RAD18>gi|l 150622 protein 
radlS [Schizosaccharomyces 
pombe] 




0.054 



0.014 



le-05 



3e-0S 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. C^nbanlr ) 



ACCESSION 



DESCRIPTION 



804 I X57689 



305 I U95102 



806 



812 



P VALUE 



Rabbit mRNA tor 



Nearest Neighbor (BlastX vs. Non-Redundant Protein^ 



ACCESSION 



calcium channel BI-2 
(lambda CBP109 and 

CBion 



Xenopus laevis 
mitotic 

phosphoprotein 90 
mRNA. complete cds 



X77753 



0.021 



0.021 



807 | X98890 



H.sapiens TROP-2 

gene 

S. tuberosum mRNA 
for inorganic 
phosphate 
transporters tPTl 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 



808 1 AF027173 cds 



Homo sapiens 
Promoter Region and 
_809 | AJ224935 PCK2 gene " 



Sambucus nigra 
ribosome inactivating 
[protein precursor 

8 10 J U76 524 mRNA. complete cds 

A.thaiiana GBF1 

811 I X99941 | gene 



X65138 



M.musculus mRNA 
for tyrosine kinase > 
::gb|S57168|S57l6S 
Selc=Eph-related 
receptor protein 
tyrosine kinase (mice. 
mRNA. 4242 nt] 



0.021 



0.021 



0.020 



0.020 



0.020 



0.020 



0.020 



2959370 



1 109830 



1723657 



2137872 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



P VALUE 



(AL022I17) hypothetical 

rotein 

(041534) coded for by C. 
elegans cDNA CEESI42F; 
Similar to helicases of 
SNF2/RAD54 family. 
[Caenorhabditis elegansl 

PROTEIN IN ERV1-GLS2 
INTERGENIC REGION 
>gi|2132587|pir||S64322 
probable membrane protein 
YGR031w - yeast 
(Saccharomyces cerevisiae) 

gi| 13230 10|gnl|PID|e243277 
(272816) ORFYGR031w 
[Saccharomyces cerevisiae] 



le-10 



5e-Il 



5e-Il 



zinc finger protein PZF - mouse 

>gil453376 



2e-19 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 




SEQ 
ID 



813 



Nearest Neighbor (BlastN vs. Ccnbank) I Nearest Neighbor (BlastX vs. Non-Redundant PrnteinO 



ACCESSION 



DESCRIPTION 



P VALUE 



L04787 



AJ223633 



AB012I06 



816 AE0O1539 



817 AF074386 



Borrelia hermsii outer 
membrane lipoprotein 



Enterococcus taecium 
genes encoding 
enterocin L50A and 
enterocin L50B plus 
5' and 3' flanking 
regions 



Brassica rapa mRNA 
for SRK45, complete 
cds 



ACCESSION 



DESCRIPTION 



0.020 



<NONE> 



0.020 



Helicobacter pylori, 
strain J99 section 100 
of 132 of the 
complete genome 



Sambucus niara 
levein-Iike protein 
mRNA. complete cds 



M55264 




819 AF052163 



820 



Herpesvirus saimiri 
dihydrofolate 
reductase (DHFR) 
and snRNA (HSUR) 
genes, complete cds. 



0.020 



<NONE> 



<NONE> 



0.020 



0.020 



0.020 



172292 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



94173 




2924250 



-lomo sapiens clone 
24456 mRNA 
equence 



AF074387 



821 AF052163 



Sambucus nigra 
hevein-Iike protein 
mRNA, complete cds 



Homo sapiens clone 
24456 mRNA 
sequence 



0.020 



0.020 



0.020 



(LI 1895) transmembrane 
srotein [Saccharomyces 
cerevisiae] 



pol polyprotein - Chinese 
hamster intracistemal A-particle 
CHIAP34 



(Z98745) dJ29KI.2 [Homo 
sapiens] 



1706288 



2113798 



(D(2C) DOPAMINE 
RECEPTOR) 

>gi|21194S2|pir||I49246 D4 
dopamine receptor - mouse 
>gi|758427 (UI9830) D4 
dopamine receptor [Mus 
musculus] 

>gi|i095539|prfI|2109259A 
dopamine D4 receptor [Mus 
musculus] 



<NONE> 



<NONE> 



(Z83259) AmphiBrf38 
[Branchiostoma floridae] 



3874733 



'Ubib4)cDMA EST 
EMBLT02354 comes from this 
gene; cDNA EST 
EMBL:D3269S comes from this 
gene; cDNA EST 
EMBL.D354I I comes from this 



sene 



4.9 



4.7 



4.7 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Gcnbank) 



ACCESSION 



DESCRIPTION 



322 | L11002 



323 | AE001539 



824 | X98890 



825 1 U97519 



826 | AF003133 



P VALUE 



Rac ankyrin binding 
glycoprotein- 1 related 
mRNA sequence 



Helicobacter pylori, 
strain J99 section 100 
of 132 of the 
complete genome 



Nearest Neighbor (BlastX vs. Non-Redundnnt iw£^ 



ACCESSION 



0.020 



0.020 



DESCRIPTION 



552132 



172292 



S.tuberosum mRNA 
for inorganic 
phosphate 
transporter, StPTl 



0.020 



3879798 



827 | U32857 



Homo sapiens 
podocalyxin-like 
protein rnRNA, 
complete cds 



0.020 



Caenorhabditis 
elegans cosmid 
T21E3 



1345633 



0.020 



Saccharomyces 
cerevisiae VAR 1 
gene, mitochondria! 
gene encoding 
mitochondrial 
protein, 3* processing 
ite. partial sequence 



1709997 



0.019 



<NONE> 



P VALUE 



(KOI 664) Bkm-Iike protein 
[Drosophila melanogaster] 

,(L 1 1 895) transmembrane 
protein [Saccharomyces 
cerevisiael 

Domain (2 domains); cDNA 
EST yk390b 1 0.3 comes from 
this gene; cDNA EST 
EMBL:D71652 comes from this 
gene; cDNA ESTyk275f8.3 
comes from this gene; cDNA 
EST yk393b9.3 comes from this 
gene; cDNA EST yk37... 
>gi|3 8S0220|gnI|PID|e i 349842 
yk390bl0J comes from this 
gene; cDNA EST 
EMBL:D7 1652 comes from this 
gene: cDNA EST yk275f83 
comes from this gene; cDNA 
EST yk393b9.3 comes from this 
gene; cDNA EST yk37... 



SYNTHASE, CYTOPLASMIC 
(Cl-THF SYNTHASE) 

LEIVETh 1 KAH i UR 



OFOLATE 

DEHYDROGENASE/ 
iMETKENYLTETRAHYDROF 
OLATE CYCLOHYDROLASE 
Cl-tetrahydro folate synthase 
[Rattus norvegicus] 



DNA REPAIR PROTEIN 
RAD 1 8 >gi|l 150622 protein 
radlS [Schizosaccharomyces 
pom be 1 



3.8 



3.8 



1.3 



0.066 



2e-07 



<NONE> 



<NONE> 



7Jb\ 



WO 01/02568 



PCT/US00/18374 



828 



Nearest Neighbor fBlastN' vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA. complete 
AF027174 cds 



0.019 



829 



830 



Laccaria bicolor 
glyoxal malaie 
Jsynthase protein 
AF034099 mRNA. complete cds 



AF100694 



831 



U24578 



Mus muse ul us 
Pontin52 mRNA, 
complete cds 
Human RPI and 
complement C4B 
precursor (C4B) 
genes, partial cds. 



0.019 



Sambucus niera lectin 
u [precursor mRNA, 

832 I U76523 complete cds 

Luoenzoruran- 

degrading bacterium 
DPO360 2.3- 
jdihydroxybiphenyl 
1 ,2-dioxygenase 
l(bphC) gene, 
complete cds and 2- 
hydroxy-6-oxo-6- 
phenyIhexa-2.4- 
dienoic acid 
U57649 hydrolase 



0.013 



0.013 



0.011 



834 



835 



Z.mays gene tor 
phosphoenolpyruvate 
X 1 5642 carboxylase 



0.011 



0.011 



X51623 



836 



US3656 



C.elegans col las en 
gene col - 13 



Rattus norvegicus NT 
KB gene, promotor 



reizion 



0.010 



0.008 



ACCESSION 



DESCRIPTION 



250638 1 



3880930 



<NONE> 



478673 



<NONE> 



<NONE> 



<NONE> 



PROTEIN 4 PRECURSOR 
(TRANSFORMING PROTEIN 
INT-3) mammary gene mRNA, 
complete cds.], gene product 
[Mus tnusculusl 



I) similar to 
Phosphoglucomutase and 
phosphomannomutase 
phosphoserine; cDNA EST 
EMBL;D36168 comes from this 
gene; cDNA EST 
EMBL.D70697 comes from this 
gene; cDNA EST yk373h9.5 
comes from this gene; cDNA 
EST EMBL:T008... 



6e-15 



<NONE> 



<NONE> 



proline-rich protein precursor - 
kidnev bean vulgaris] 



3.1 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



1695686 



(D83706) pyruvate carboxylase 
[Bacillus stearothermophilus] 



3.1 



4240195 



(AB020660) KIAA0853 protein 
[Homo sapiens] | 10.0 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Gcnbank) 



ACCESSION 



DESCRIPTION 




P VALUE 



Homo sapiens yene 
encoding retina- 
specific guanylyl 
i22j_AJ222657jcyclasc 



840 



Brassica rapa mRNA 

for SRK45. complete 
838 1 ABO 12 106 cds 



0.008 



J39 1 U76524 



ISambucus nigra 
Iribosome inactivating 
(protein precursor 
[mRNA. complete cds 



0.008 



0.008 



ISambucus niqra 
ribosome inactivating 
Jprotein precursor 
AF012899 mRNA, complete cds 



0.00S 



Sambucus nisra 
ievein-like protein 
841 AFQ74 385 mRNA. complete cds 



842 I U48734 



Human non-muscle 
alpha-actinin mRNA. 
complete cds 



843 I U66669 



(Homo sapiens 3- 
hydroxvisobutvrvi 
coenzyme A 

[hydrolase mRNA, 

[complete cds 



0.00S 



0.00S 



0.007 



844 I D 16492 



Mouse mRNA for 
IP 100 serine protease 

of Ra-reactive factor 
[(RaRF). complete cd.s 



0.007 



ACCESSION 



417704 



544024 



532468 



4101160 



1711520 



2829922 



<NONE> 



<NONE> 



DESCRIPTION 
POL POLYPROTEDsT" 
(ORF1A/1B) [CONTAINS; 
RN A- D IRECTED RNA 
POLYMERASE ; HELICASE; 
PROTEASE 1 

LHLUKIDE CHANNEL " 

PROTEIN, SKELETAL 
MUSCLE (CHLORIDE 
CHANNEL PROTEIN 1) (CLC- 
1) human >gi|397!43 (Z25587) 
human C1C-1 muscle chloride 
channel [Homo sapiens) 
>gi|398161 (225884) human 
ClC-1 muscle chloride channel 
[Homo sapiens] 



P VALUE 



7.4 



(U 1 3643) similar to reverse 
transcriptase; possible 
pseudogene [Caenorhabditis 
elegans] 



(AF002589) cytochrome 
oxidase I [Austrofundulus 
limnaeus] 



4.6 



3.8 



2.7 



SRB-8/9 PRO TEIN 
>gi| 13 34996 



1.6 



(AC00229I)extensin 
'Arabidopsis thaliana] 



0.11 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



845JD90923 



DES CRIPTION 
luman 



immunodeficiency 
virus type 1 proviral 
DNA for envelope 
glycoprotein, partial 
cds. isolate 03 S 



346 



847 



848 



ABO 1 1087 



AE0O0688 



X63723 



Homo sapiens mRNA 
for KJAA0515 
protein, partial cds 



0.007 



0.007 



Aquifex aeolicus 
section 20 of 109 of 
the complete genome 



B.bovis WCl.i 
mRNA 



849 I AF074386 



850 



Sambucus nigra 
hevein-like protein 
mRNA. complete cds 



0.007 



0.007 



0.007 



851 



Human beta globin 
region AIu repetitive 
J00097 j sequence type T. 
uman 

immunodeficiency 
virus type 1 proviral 
DNA for envelope 
glycoprotein, partial 
_D90923 (cds, isolate 03S 



0.007 



852 



U95094 



JXenopus iaevis XL- 
INCENP (XL- 
INCENP) mRNA, 

[complete cds 



0.007 



0.007 



853 | X91618 



iT.casianeum 
[hunchback sene 



854 | X03838 



IRat nontranscribed 
I spacer (NTS) 
downstream of 2SS 
rRNA £ene 



0.007 



0.007 



855 



M55049 



Rattus norwegicus 
interleukin-2 receptor 
alpha chain (CD25) 
[mRNA. complete cds. 



0.007 



Nearest Neighbo r (BlastX vs. Non-Redundant ProtemsT 



P VALUE ACCESSION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



ACCESSION 



DESCRIPTION 



P VALUE 



856 | 264318 



857 1 AF027I73 



H.sapiens CpG DNA. 
clone 9e2, reverse 
read cpg9e2.nla 



858 | AF027174 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
A) mRNA, complete 
cds 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA. complete 
cds 



0.007 



0.007 



859 1 AF012899 



,860 I X95276 



Sambucus nisra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 
P.t'alciparum 
complete gene map or 
plastid-like DNA 



0.007 



0.007 



0.007 



86 i 



Lycopersicon 
lesculentum class II 
(small lit 

protein Le-HSP17.6 
U72396 I mRNA, complete cds 



862 



iMus musculus 
Pontin52 mRNA. 
AF10Q694 complete cds 



0.007 



863 



Leucania sepcrata 
nuclear polyhedrosis 
virus DNA for p 13. 
Jxe, envelope protein, 
AB000383 complete cds 



864 



Human DNA for 
D86566 NOTCH4, partial cds 



0.007 



0.007 



0.00* 



865 I U76524 



(Sambucus nisra 
Iribosome inactivating 
(protein precursor 
[mRNA. complete cds 



0.007 



Nearest Neighbor (BlastX vs. Non-Redundant Prore.nT) 



ACCESSION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



£36 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONB> 



<NONE> 



WO 01/02568 



PCT/USOO/18374 



Nearest Neighbor (B las tN vs. Cenbank 



ACCESSION 



DESCRIPTION 



P VALUE 



866 



AF027173 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
A) mRNA, complete 
cds 



867 



AF027I74 



0.007 



Arabidopsis thaliana I 
cellulose synthase 
catalytic subunit (Ath 
B) roRNA, complete 
cds 



863 



0.007 



U76524 



ISambucus nigra 
[ribosome inactivating 
[protein precursor 
ImRNA, complete cds 



0.007 



869 



870 



Brassica rapa mRNA 
for SRK45, complete 
AB012I06 cds 



AF093268 



871 



872 



AF074386 



L423I9 



Rattus norvegicus 
|homer- lc mRNA, 
complete cds 



0.007 



0.007 



Sambucus nigra 
hevein-like protein 
mRNA. complete cds 



0.007 



Bos taurus (clone 
Sal3.S) tristctraprolin 



0.007 



Nearest Neighbor (BlastX vs. Non-Redundant FWi^T 



ACCESSION 



DESCRIPTION 



3047072 



(AF058825) No definition line 
found f Arabidopsis thaliana] 



975754 



(U29359) SpaO [Salmonella 
enterical 



1213557 



|(U50I99) coded for by C 

elegans cDNA yk89e9.5; coded 

for by C. elegans cDNA cm7g5; 

coded for by C. elegans cDNA 

cmI4b9; coded for by C 

elegans cDNA yk52g5.5; coded 

for by C. elegans cDNA 

yk76e5.5; coded for by C 

elegans cDNA yk 13 If II .5; c... 
iaOuiii>j-l- : 1 



2499568 



4092077 



ISOASPARTATE(D- 
AS PART ATE) O- 

METHYL TRANSFERASE 
( PROTEIN- BET A- 
ASPARTATE 

METHYLTRANSFERASE) 
(PIMT) (PROTEIN L- 
ISOASPARTYL/D- 
ASPARTYL 

METHYLTRANSFERASE) 
methyl transferase [Drosophila 
melanogaster] >gi|I 171337 
melanosasterl 



(AF095353) toll-like receptor 4 
mutant [Mus musculus] 



151377 



2507337 



(M80653) tetraheme 
[Pseudomonas stutzeri] 
TRANSCRIPTION 
TERMINATION FACTOR 
RHO 



P VALUE 



8.6 



8.3 



6.2 



5.5 



WO 01/02568 



PCT/US00/18374 



-IS. — 



SEQ 
ED 



Nearest Neighbor (BlastN vs. Ccnbank) 




ACCESSION 



DESCRIPTION 



P VALUE 



873 I M59815 



874 



X63723 



875 I AB009864 



Human complement 
component C4A 
gene, exons 10 
through 4 1 . 
B.bovis WCU 
mRNA 

Expression vector 
pMElSS-FL3, 
complete sequence 



876 U76524 



Sambucus niara 
ribosome inactivating 
protein precursor 
mRNA. complete cds 



877 I U95102 



Xenopus laevis 
mitotic 

phosphoprotein 90 
mRNA. complete cds 



0.007 
0,007 

0.007 



0.007 



Chicken mRNA for 
aldehvde 



879 | AF027735 



Nephila clavipes 
minor ampullate silk 
protein MiSpl 
mRNA. partial cds 



0.007 



Bos taurus tuftelin 
880 | AF1Q5228 ImRNA. complete cds 



0.007 



0.007 



Mus musculus 
|Pbmin52 mRNA. 
881 | AF1QQ694 [complete cds 



0.007 



ACCESSION 



DESCRIPTION 



3876769 
2969893 

2137618 



2804497 



P VALUE 



) Similarity to Human 
Prolyl 4-hydroxylase alpha 
subunit (SW:P4HA_HUMAN); 
cDNA EST yk2I9gI2.5 comes 
from this gene; cDNA EST 
yk3 19d8.5 comes from this 
gene; cDNA EST yk339dll.5 
comes from this gene; cDNA 
ESTyk371c9.3... 



(AJ001858) human SIM2 
Homo sapiens] 

p45 NF-E2 related factor 2 - 
mouse musculus] 



(AF043705) contains similarity 
o C2H2-type zinc fing ers 



440298 



2394390 



3036S02 



(L27469) product of alternative 
splicing [Drosophila 
melanogaster] 



(L75945) flagellar export 
protein [Sorrel 1a burgdorferi 



(AF0 17434) pmi-like gene 
product [Methylobacterium 



2500S14 



(A L0223 73) putative protein 
FU 1 Hi: 1 11 AL tolTi KB — 

I PROTEIN T27F2.1 IN 

CHROMOSOME V 

>gi|3880311|o n l|PID|el349S55 

BX42 (SW:BX42_DROME); 

cDNA EST EMBL:C07233 

comes from this gene; cDNA 

ESTEMBL:C08532 comes 

from this gene; cDNA EST 

yk501hl0.3 comes from this 

gene; cDNA EST ykiOlfl .3... 



53 



5.3 



5.1 



5.0 



4.7 



4.0 



3.9 



3.8 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor ( BlastN vs. Genbank! 



ACCESSION DESCRIPTION P VALUE 



882 



L. major mRNA for 
X93567 Ibeta-tubulin f l404bD 



Brassica rapa mRNA 
for SRK45, complete 
883 || AB012106 cds 



Homo sapiens 
WSCR4 gene, exons 
884 | AF041056 3 and 4 



0.007. 



885 



Rattus norvegicus 
J homer- lc mRNA. 
AF093268 complete cds 

[Hevea brasiliensis 3 



0.007 



hydro.xy-3- 
methylglutaryl- 
coenzyme A 
reductase sene, 3' 
886 | M74798 end. 



887 Z62997 



0.007 



0.007 



H.sapiens CpG DNA 
clone 76gU, reverse 
read cpg76gl l.rtla . 



888 1 AF074386 



(Sambucus nigra 
hevein-Iike protein 
mRNA. complete cds 



0.007 



ACCESSION 



DESCRIPTION 



2317862 



3881103 



135817 



1723518 



10012S2 



1176532 



889 



L29426 



Synechoeystis species 
(strain PCC 6803) 
drgA gene, complete 
cds. 



0.007 



0.007 



2498317 



3SS2275 



P VALUE 



modules 4 & 5 [Streptomyces 
fradiae] 



(AL032646) predicted using 
Genefinder; cDNA EST 
EMBL:D76407 comes from this 
gene;cDNA EST 
EMBL:C08999 comes from this 
gene; cDNA EST yk!99b!2.5 
comes from this gene; cDNA 
EST yk282a4 5 comes from this 
gene; cDNA EST EMBL:C 0... 
THROMBIN RECEPTOR 
PRECURSOR human 
>gi|339677 (M62424) thrombin 
receptor [Homo sapiens] 
HYPOTHETICAL 32.2 KD 
PROTEIN C22E 12.04 IN 
CHROMOSOME I >gi| 1220279 
(Z70043) unknown 



[(D640Q3) polyA polym e rase 

|hVpoth£t/CaI I'u.d KD 

PROTEIN C34EI0.8IN 
CHROMOSOME in 
>gi|500731 (U 10402) weakly 
similar to protein C kinase 
substrate [Caenorhabditis 



bVA-lPOLVPROTEIN 
PRECURSOR nematode 
polyprotein antigen precursor 
[Dictyocaulus viviparus] 
>giU58542l|prfl|2l24414A 
polyprotein antigen/allergen 
f Dictyocaulus viviparus] 



f AB01S32O) KIAA0777 protein 
[[Homo sapiens] 



2.7 



2.2 



1.9 



1.8 



1.2 



i.l 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlasiN vs. Genbank) 



ACCESSION! DESCRIPTION I P VALUE 1 AOTSSinM 



890 1 D83329 



891 



IMus musculus DNA 
Ifor prostaglandin D2 
(synthase, complete 
[cds 



Brassica rapa mRNA 
for SRK45, complete 
AB012106 cds 



892 1 U76524 



_893 1 AF027173 



ISambucus nigra 
Iribosome inactivating 
[protein precursor 
mRNA. complete cd s 

Arabidopsis thuliana 
cellulose svnthase 
catalytic subunit (Ath 
A) mRNA, complete 
cds 



0.007 



0.007 



0.007 



J94 I Y1I918 



895 J L36827 binding protein I 



896 I L36827 



Ksapiens IMAGE 
cDNA done 26SS1 



Mus Musculus 



VIus Musculus 
alphaA-crystallin- 
binding protein I 



0.007 



0.007 



0.007 



897 | Z65719 



898 1 AF064029 



899 | AF05I730 



H.sapiens CpG DNA. 
clone 54c 10. reverse 
read cpg54cl0.rtla . 



0.007 



0.007 



Helianthus tuberosum 
lectin 1 mRNA. 
complete cds 



Mus musculus 
cathepsin S fCatS) 
gene, exon 6 



0.007 



0.007 



Nearest Neighbor (BlastX vs. Non-Redundant frS 



1001741 



1723928 



121452 



927287 



1055188 



4063019 



1097307 



1174915 



1707017 



DESCRIPTION 



(D64004) hypothetical protein 



HYPOTHETICAL 11.6 KD 
PROTEIN IN NUTI-AR02 
INTER GENIC REGION 
PRECURSOR YGL149w - 
yeast (S accharomvces 

MOLECULAR WEIGHT 
SUBUNIT 12 PRECURSOR 
>gi|82606|pir||A24266 giutenin 
high molecular weight chain 12 
)recursor - wheat >gi[2l779 



(U30294) ORF2 [Prevotella 
ruminicolal 



(L40061) contains similarity to 
transmembrane domains like 
those found in sugar transporter 
proteins 



porter. 



4063019 TliF [Pseudomonas fluorescensl 



(AFO83061) ABC transponer 
TliF [Pseudomonas fluorescensl 



HIC- 1 gene [Homo sapiensl 
U 1KUPHIN (DYSTROPHIN- 
R ELATED PROTEIN 1) 
(DRP1) (DRP) 

gi|284488|pir||S28381 utrophin 
protein) [Homo sapiens] 



P VALUE 



0.97 



0.94 



0.79 



0.35 



0.26 



0.21 



0.20 



0.20 



U7S72I) RNA helicase isolog 
[Arabidopsis thaliana] 



0.002 



0.001 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

H> I ACCESSION 



900 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins! 



U62398 



cuniculus 
gp42/basigin/OX- 
47/HT7 mRNA, 
complete cds. 



ACCESSION 



901 



902 



903 



[M.musculus 
glutathione reductase 
X7634I mRNA. 

Rat (lambda 20B0.5> 
M-type 6- 
phosphofructo-2- 
kinase/fructose-2, 6 
M26215 bisphosphatase 



Homo sapiens 
KIAA0442 mRNA. 
AB007902 [partial cds 



0.007 



0.007 



0.007 



904 



Lactococcus lactis 
cremoris plasmid 
pNZ4000 insertion 
sequence IS982 
putative trans posase 
gene and eps gene 
cluster 

(epsRXABCDEFGHI 
U93364 JKL), complete cds 



905 



AF093268 



906 



AF 100694 



907 I AF074386 



908 



Rattus norvegicus 
homer- 1c mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



0.007 



0.007 



0.006 



Sambucus ni^ra 
le vein- like protein 
mRNA, complete cds 



0.006 



AF027174 



909 I AJ0O5813 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA, complete 
cds 



0.006 



0.006 



Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzvme 



0.006 



DESCRIPTION 



2370494 



3513303 



3036809 



2662165 



2731377 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE: 



P VALUE 



(298944) hypothetical protein 



2e-04 



(AC005594) R26984J [Homo 
sapiens] 



f AL022373) putative protein 
(AB007902) H«07l2 cDNA 
cione for KIAA0442 has a 574- 
bp insertion at position 1474 of 
the sequence of KIAA0442. 
[Homo sapiens] 



8e-07 



6e-15 



2e-I7 



(U28739) similar to alcohol 
dehydrogenase/ribitol 
dehydrogenase [Caenorhabditis 

eie^ansl 

IBM ' 



<NONE> 



<NONE> 



Ie-31 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/USOO/18374 



Nearest Neighbor fBLvaN vs. Genbank) 



SEQ 

I ACCESSION 



DESCRIPTION 



P VALUE 



Nearest neighbor (SlastX vs. Non-R^nnH^r o*.^' 



ACCESSION 



910 I AF027174 



911 I AF09326S 



912 I AF093263 



913 | AB012IQ6 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA, complete 
cds 



I 



914 I AF064029 



915 | API 00694 



916 1 AF09326S 



Rattus norvegicus 
homer- 1 c rnRNA, 
complete cds 
Rattus norvegicus 
homer- Ic mRNA. 

complete cds 

Brassica rapa mRNA 
for SRK45, complete 
cds 

Helianthus tuberosus 
lectin 1 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



0.006 



0.006 



0.006 



0.006 



0.006 



0.006 



Rattus norvegicus 
homer- Ic mRNA. 
complete cds 



0.006 



917 I API 00694 



91S | AF0I2899 



919 I XS02S9 



Mus musculus 
Pontin52 mRNA. 
complete cds 



0.006 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 



0.006 



H.sapiensPTPLl 
mRNA for protein 
tyrosine pho.sphjtase 



0.006 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



4049856 



3880536 



3877761 



116S791 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(AF063866) ORF MSV064 
hypothetical protein 
[Melanoplus sanguinipes 
entomopoxvirusl 

mimu — 



predicted using 
Genefinder; sim ilar to Lectin C 
type domain short and long 
forms (2 domains); cDNA EST 
EMBL:C 10633 comes from this 
gene; cDNA EST 
EMBLCI2424 comes from this 
gene: cDNA EST ykI9Ie7.3 
comes from this ... 



<NONE> 



<NONE> 



9.6 



(2S1552) F56G4.1 

[Caenorhabditis elegans] 

>gi|3S7S6I5|enl|PID|el348240 
(ZS3118) F56G4.1 



7.9 



CATHEPSIN E PRECURSOR 
precursor - rabbit >gi|402729 
(L0S41S) procathepsin E 



7.5 



7.4 



■vii 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Oenbanki 



[ACCESSION 



DESCRIPTION" 



P VALUE 



920 1 AF074386 



921 



Sambucus nigra 
hevein-Iike protein 
mRNA, complete cds 



_ 



U72396 



922 



AF074387 



923 I AB012106 



924 



AF074386 



925 



Lycopersicon 
esculentum class II 
small heat shock 
protein Le-HSPl7.6 
mRNA, complete cds 



0.006 



Sambucus nigra 
hevein-Iike protein 
mRNA. complete cds 
Brassica rapa mRNA 
for SRK45. complete 
cds 

Sambucus nigra 
levein-Iike protein 
mRNA, complete cds 



0.006 



0.006 



0.006 



0.006 



AFOI2899 



926 



927 



AF064029 



Sambucus niizra 
ribosome inactivating 
protein precursor 
mRNA, complete cd.s 



0.006 



-lelianthus tuberosus 
ectin I mRNA. 
complete cds 



0.006 



AJ0058I3 



Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzvme 



928 



U33949 



Nearest Neighbor f Bias tX vs. Non-Redundant Pmr»inci 



ACCESSION 



0.006 



Human Down 
Syndrome region of 
chromosome 2 i . 
genomic sequence, 
clone A12H1-1A6. 



0.006 



1346371 



2196567 



2113798 



1388166 



2496785 



416592 



416592 



325S5S4 



DES CRIPTION 
DIAL tLLrLVLkkOL 



KINaS] 

DIACYLGLYCEROL 
KINASE) 

>gi|477059|pir||A47744 
diacylglycerol kinase (EC 
2.7. 1. 107) beta- rat 90kDa- 
diacylglvcerol kinase fRattus 



(DS8588) lipoprotein 
[Escherichia colli 



ZS3259) AmphiBrf38 
Branch iostoma floridael 

U58282) Bowel [Drosophila 
melanogaster] 



HYPOTHETICAL 20. 1 KD 
PROTEIN Y4YS 
A- AUG Lb i'LVLN 

ATTACHMENT SUB UNIT 
PRECURSOR 
>gi|10il70|pir||A41258a- 
agglutinin core protein AG A I 
yeast (Saccharomyces 
cerevisiae) 

-agoLUIIMN 

ATTACHMENT SUBUNIT 
PRECURSOR 
gi|101170|pir||A4l258 a- 
agglutinin core protein AGAI 
yeast (Saccharomyces 
cerevisiae) 



U4 1263) The 3'UTRofthis 
gene overlaps the 3' UTR of 
T19Dl2.6(confirmed bv EST 
lits) [Caenorhabditis elegans] 



3850997 



(AF067150) beta-hydroxyacyl- 
ACP dehydratase precursor 



P VALUE 



5,5 



4.3 



4,3 



4.3 



4,2 



2.7 



2.5 



2.0 



1.9 



WO 01/02568 



PCT/US00/18374 




Nearest Neighbor f BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



i 175 I AF027171 
H76| Y09232 



1177 [ AJ0058I3 



1178| API 00694 



1179 | AF072847 



1180| AFO 12899 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 

cds 

H.sapiens fertilin 
alpha pseudoaene 
Arabidopsis thaliana 
mRNA for 
neoxanrhin cleavage 
enzyme 
Mus musculus 
Pontin52 mRNA. 
complete cds 
Homo sapiens 
putative swelling- 
activated chloride 
channel (CLNSIA) 
gene, intron 6 



>ucus nigra 



Samb 
ribosome inactivating 
protein precursor 
mRNA, complete cds 



Sambucus niara 
ribosome inac tivating 
(protein precursor 
1131 1 U76 524 [mRNA. complete cds 



11821 AF027173 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
A) mRNA, complete 
cds 



P VALUE 



2e-04 
2e-04 



2e-04 



2e-04 



2e-04 



2e-04 



2e-04 



2e-04 



Nearest Neighbor (BlastX vs. Non-Reri U nrf.-n. 



ACCESSION 



<NONE> 
<NONE> 



<NONE> 



<NONE: 



<NONE> 



<NONE> 



<NONE> 



1213557 



DESCRIPTION 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



ins) 



P VALUE 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 

(U50199) coded for by C. 
elegans cDNA ykS9e9.5; coded 
for by C elegans cDNA cm7g5; 
coded for by C. elegans cDNA 
cmi4b9; coded for by C. 
elegans cDNA yk52g5.5; coded 
for by C. elegans cDNA 
yk76e5.5; coded for by C 
elegans cDNA vkl3ifllSr 



<NONE> 



8.4 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BiastN vs. Genbank) 



SEQ 

JPJ ACCESS ION 




Nearest Neighbor (BlastX vs. Non.RXj,,^, p^^., 



1183 



Lycopersicon 
lesculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSP17.4) mRNA, 
AF090U5 complete cds 



1 184 | AF012899 



1185 I AF074386 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 



Sambucus nigra 
hevein-like protein 
mRNA, complete cds 



1I86| AF027173 



Arabidopsis thaliana 
cellulose synthase 

r 

catalytic subunit (Ath 
A) mRNA, complete 
cds 



HS7 AJ005813 



M88 1 AF027174 



Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzvme 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA, complete 
cds 



P VALUE ACCFSSTONT 



DESCRIPTION 



2e-04 



729008 



2e-04 



2507582 



1085500 



2e-04 



2623967 



2e-04 



P VALUE 



DOMAIN KfcLkFl'UR r 
PRECURSOR (TYROSINE. 
PROTEIN KINASE CAK) 
(CELL ADHESION KINASE) 
(TYROSINE KINASE DDR) 
(DISCOIDIN RECEPTOR 
TYROSINE KINASE) (TRK E) 
(PROTEIN-TYROS ENE 
[KI N A SE RTK 6) sapiens! 

PROTEIN IN MOLR-BGLX 
INTERGENIC REGION 
>gi| 1 788436 (AE000300) 
putative regulator [Escherichia 
colij 



collagen alpha I (IX) chain- 

mouse musculus] 

>o j |744962|prfl|20 1 5 346 A 

colIagen:S UB UNIT=alpha 1 :ISO 
TYPE=IX (Mus musculus] 



(Y 13942) GTN Reductase 
[Agmbacterium radiobacterl 



2497316 



GLYCOSYLATION END 
PRODUCT-SPECIFIC 
RECEPTOR PRECURSOR 
(RECEPTOR FOR 
ADVANCED 
5LYCOSYLATION END 
RODUCTS) products receptor 
precursor - bovine >gi| 16365 I 
(M912I2) receptor for advanced 
glycosylation end products [Bos 
taurusj 



8.3 



7.8 



7.8 



7.4 



5.3 



2e-04 



001710 



[(D640Q4) hypothetical protein [ 3,5 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor iBlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



1189| AJ005813 



,1190 1 AF027173 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



Arabidopsis thaliana 



mRNA for 
neoxanthin cleavage 
enzvme 



DESCRIPTION 



2e-04 



Arabidopsis thaliana 
cellulose svnthase 

* 

catalytic subunit (Ath 
A) mRNA, complete 
cds 



1191 1 U72396 



Lycopersicon 
esculentum class II 
small heat shock 
protein Le-HSP 17.6 
mRNA, complete cds 



1 192 | AF0901I5 



1194 AF 1 00694 



1195 AF064029 



11961 L34219 



1197 1 X5 t S90 



Lycopersicon 
esculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSP17.4) mRNA, 
complete cds 

Chlamydomonas 
reinhardtii dyne in 
heavy chain alpha 
(ODA I L) gene, exons 

Mus musculus 
Pontin52mRNA. 
compl ete cds 

Helianthus tuberosus 
ectin 1 mRNA, 
complete cds 



-lomo sapiens 
retinaldehvde-bindina 
protein (CRALBP) 
gene, complete cds. 



2e-04 



2e-04 



2e-04 



Rhesus monkev 
nterleukin-3 sene 



le-04 



le-04 



3258584 



le-04 



le-04 



2736338 



2196567 



3319874 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



gene overlaps the 3' UTR of 
T19Dl2.6(conflrmed by EST 
hits) [Caenorhabditis elegans] 



(U4 1263) The 3' UTR of this 



P VALUE 



2.1 



[AF038623) contains similarity 
:o RNA recognition motifs 



0.39 



(D83588) lipoprotein 
Escherichia colt] 



0.69 



(AJ006096) F-spondin 
[Branchiostoma floridae] 



5e-04 



(Z81077) predicted using 
Genefi nder; Similarity to Yeast 
rotein3243 (TR:G537531) | 2e-09 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



%M5 



WO 01/02568 
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_| Nearest Neighbor (Blast N vs. Genbank) 


1 Nearest Neighbor fBIastX vs. Nnn-ReHnnHnnr ProrHns) 




SEQ 
ID 


ACCESSICtt 


1 DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 








Plasmodium 












1198 


AE0O1421 


falciparum 
chromosome 2, 
section 58 of 73 of 
the complete 
sequence 


Ie-04 


<NONE> 


<NONE> 


<NONE> 


1199 


AF090115 


Lycopersicon 
esculentum cytosolic 
class II smalt heat 
shock protein HCT2 
(HSP17.4) mRNA. 
complete cds 


Ie-04 


1 <NONE> 


<NONE> 


♦ 

<NONE> 


1200 


AF027174 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 


Ie-04 


2576287 


(Y15086) HepC protein 
[Cylindrotheca fusiform is) 


4.7 


1201 


AJ005813 


Arabidopsis thatiana 
mRNA for 
neoxanthin cleavage 
enzvme 


le-04 


3395673 


(ABO 16623) RWC-3 [Oryza 
sativa] 


0. 14 


1202 


AP03S035 


Homo sapiens 
B RCA 1 -associated 
RING domain protein 
(BARD1) gene, 
exons 2 and 3 


9e-05 


<NONE> 


<NONE> 


<NONE> 


1 


203 


AJ0058I3 


Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzvme 


9e-05 


<NONE> 


<NONE> 


<NONE> 


1204 


ABO 12 106 < 


Brassica rapa mRNA 
for SRK45, complete 
:ds 


9e-05 


<NONE> 


<NONE> 


<NONE> 


I 

1 

1 1205 


i 
1 

U9509S r 


Xenopus laevis 
Tiitotic 

jhosphoprotein 44 
nRNA, partial cds 


9e-05 


<NONE> 


t 

<NONE> 


<NONE> 


1 1206| 


I 

g 

s 

AF034099 n 


-accaria bicolor 
ilyoxal ma late 
ynthase protein 
nRNA. complete cds 


9e-05 I 


I 
I 
I 

^ 

1351553 > 


4VW> l l , HtriCAL 
APOPROTEIN MG343 
'RECURSOR 
>gi|136'166S|pir||E64238 
lypothetical protein MG348 - 
Mycoplasma genitalium (SGC3) 
>gi|384493l 


\ 

8.3 



<2Jf(p 



WO 01/02568 



4 



PCT/US00/18374 





Nearest Neighbor ( Blast N vs. Genbank) 


Nearest Neighbor (BlastX vs. No n -Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















L207 


D50006 


Human DNA for 
alpha-platelet-derived 
growth factor 
receptor, exon 6- 10 


9e-05 


3063639 


(AF056494) NADH 
dehydrogenase subunit 5 
[Panorpa japonica] 


5.1 


1203 


U50423 


Human Down 
Syndrome region of 
chromosome 21, 
clone A4IB8-1B7. 


9e-05 


124273 


INHIBIN ALPHA CHAIN 
PRECURSOR bovine 
>»ill63l95 fMH^73^ inhibin A 
subunit [Bos taurus] 


3.0 


1209 


AJ005813 


Arabidopsis thaliana 
mRNA for 
nebxanthin cleavage 
enzyme 


9e-05 


4007782 


(X72850) 2.4- 
dihvdroxvbenznate 

monooxygenase [Sphingomonas 
sp] 


2.3 


1210 


AC005276 


Homo sapiens clone 
fragment 

UWGC;gap3 from 
7q3 1.3, complete 
sequence [Homo 
sapiens] 


9e-05 


1492075 


(U603I5) MC132L [Molluscum 
contaaiosum virus subtype I] 


1.0 


1211 


AF 100694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


9e-05 


2887423 


sapiens] 


2c- 10 


1212 


X77772 


C. fuse us gamma-M2- 
1 crystal 1 in mRNA. 


9e-05 


2072425 


(U83 115) non-lens beta gamma- 

Pr\.'Cfi II in liW* nrrtf#*in rUnmn 

sapiens] 


7e-25 


1213 


AB012106 


Brassica rapa mRNA 
for SRK45, complete 
cds 


8e-05 


<NONE> 


<NONE> 


<NONE> 


1214 


L06178 


igusiica complete 
mitochondrial 
genome 


3e-05 


<NONE> 


<NONE> 


<NONE> 


1215 


ABO 12 106 


3rassica rapa mRNA 
for SRK45, complete 
cds 


8e-05 


<NONE> 


<NONE> ' 


<NONE> 


1216 


U76524 


Sambucus nigra 
ribosome inactivating 
xotein precursor 
mRNA. complete cds 


8e-05 


<NONE> 


<NONE> 


<NONE> 


1217 


L0617S 


Apis mellifera 
igustica complete 
mitochondrial 
iienome 


Se-05 


<NONE> 


<N'ONE> 


<NONE> 


121 S 


AB012106 


Srassica rapa mRNA 
for SRK45, complete 
;ds 


Se-05 


<NONE> 


<NONE> 


<NONE> 



1H1 



I 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



1219 



1220 



ACCESSION 



DESCRIPTION 



AF 1 00694 



AB012L06 



1221 



AB012106 



1222 



AF027174 



1223 



AF024605 



1224 



Y 13 143 



1225 AJO05813 



1226 



1227 



AF027173 



AF 100694 



Mus musculus 



Pontin52 mRNA, 
complete cds 



Nearest Neighbor (BlastX vs. Np n . Redundant Proteins) 



P VALUE 



Brassica rapa mRNA 
forSRK45, complete 
cds 



Brassica rapa mRNA 
for SRK45, complete 
cds 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 



Homo sapiens serine 
protease- 1 ike protease 
Sequence 2 from 
patent US 5736377 



iattus norveaicus 
mRNA for PAG608 
gene 



Arabidopsis thaliana 
mRNA for 
neoxanthin cleavase 
enzvme 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
A) mRNA. complete 
cds 



vlus musculus 
6ntin52 mRNA, 
complete cds 



ACCESSION 



8e-05 



8e-05 



8e-05 



8e-05 



8e-05 



8e-05 



8e-05 



8e-05 



8e-05 



<NONE> 



<NONE> 



1722841 



1205991 



3242783 



2314243 



1170586 



DESCRIPTION 



P VALUE 



464239 



<NONE> 



<NONE> 



WNT- 11 PROTEIN 
PRECURSOR (XWNT-1 1) 
clawed frog >gi|439108 
(L23542) maternal protein 



(U35637) nebulin [Homo 
sapiens] 



(AF055354) respiratory burst 
oxidase protein B 



(AE000616) alpha-ketoglutarate 
permease (k°tP) 



HAS GlHASL-ALllV AUNG- 
LIKE PROTEIN IQG API 
(Pl95)(KlAA005l) 
>gi|627594|pir||A54S54 Ras 
GTPase activating- related 
protein - human sapiens] 
>gi|536844 (L33075) ras 
GTPase-activating-like protein 
[Homo sapiens 



<NONE> 



<NONE> 



544353 



NADH- UBIQUINONE 
OXIDOREDUCTASE CHAIN 
4>gi|l0851S5|pir||S5:96S 
NADH dehydrogenase chain 4 - 
honeybee mitochondrion 
(SGC4) >gi|552446 (L06178) 
NADH dehydrogenase subunit 4 
[Apis mellit'era ligustica] 



F-SPONDIN PRECURSOR 



9.9 



9.6 



8.6 



8.1 



7.8 



3.5 



3-D 



i 



i 
i 

i 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1223 


U76524 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


8e-05 


483243 


apolipoprotein B- 100 - chicken 
(fragment) 


3.4 


1229 


AF093268 


Rattus norvegicus 
homer* Ic mRNA, 
complete cds 


8e-05 


91207 


proline-rich protein - mouse 
(fragment) musculus] 


2.2 


1230 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 


8e-05" 


249918 1 


ZONADHESIN PRECURSOR 
>gi| 1066466 


2.2 


1231 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 


8e-05 


2499181 


ZONADHESIN PRECURSOR 
>gi| 1066466 


1.9 


1232 


AB012106 


Brassica rapa mRNA 
for SRK45, complete 
cds 


8e-05 


2833647 


(.AF027972) flagelliform silk 
protein [Nephila clavipes] 


1.6 


1233 


AF093268 


Rattus norvegicus 
homer- lc mRNA, 
complete cds 


8e-05 


r 

1 163063 


(Z49821) MY02 
[Saccharomyces cerevisiae] 


0.90 


1234 


AF027174 


Arabidopsis thaliana 

catalytic subunit (Ath- 
B) mRNA, complete 
cds 


8e-05 


1653488 


(D90914) hypothetical protein 


0.30 


1235 


M26510 


Chicken nonmuscle 
myosin heavy chain 
(MHC) gene, 
complete cds. 


8e-05 


112159 


plectin - rat 


0.003 


1236 


U56402 


Human chromatin 
structural protein 
lomoloH 


8e-05 


2088823 


(AF003384) weak similarity to 
the peptidase family A2 


le-13 


1237 


AF 100694 


Vlus musculus 
Pontin52 mRNA. 
complete cds 


3e-05 


437181 


(U02289) GTPase-activating 
protein [Caenorhabditis elegans] 


2e-17 


123S 


AF { 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


8e-05 


465983 


HYPOTHETICAL 80 S KD 
PROTEIN ZC2 1.4 IN 
CHROMOSOME III 


8e-27 



<IH<1 
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Nearest Neiahbor (BlastN vs. Genbank) 


| Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


I ACCESSION 


DESCRIPTION 


P VALUE 
















1239 


AF090115 


Lycopersicon 
esculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSP17.4) mRNA, 
complete cds 


7e-05 


<NONE> 


<NONE> 


<NONE> 


1240 


■ 

U83656 


Rattus norveaicus NP- 
KB gene, promoter 
reaion 


7e-05 

" 1 


3880858 


(AL031633) predicted using 
Genefinder; cDNA EST 
yk304fl2.5 comes from this 
gene [Caenorhabditis eleaans] 


9.3 


1241 


AF074387 


Sambucus nigra 
hevein-Iike protein 
mRNA, complete cds 


7e-05 


3080538 


(AL022600) hypothetical 
protein 


9.2 


1242 

1 ■ 


XS9398 

■ 


H.sapiens ung gene 
for uracil DNA- 
glycosylase 


. 7e-05 


549700 
„ .. 1 


HYPOIHLULAL IjJ iOJ 

rKUlhliN IN ivlDn 1- VMA3 
INTERGENIC REGION 
>gi|539182|pir||S37908 
hypothetical protein YKLOSjw - 
yeast (Saccharomyces 
cerevisiae) >gi|486l20 
(Z28082) ORF YKL083w 


1.8 


1243 


M83753 


Bovine follicle 
stimulatina hormone- 
beta subunit gene, 
complete cds. 


7e-05 


2398621 


(AJ000342) DMBT1 protein. 
5.8 kb transcript [Homo sapiens] 


LS 


1244 


MS0829 


Rat troponin T 
cardiac isoform sene. 
complete cds 


5e-05 


854065 


(XS3413) (JS8 [Human 
herpesvirus 6] 


2e-08 


124s! AF074387 


Sambucus nigra 
hevein-like protein 
mRNA. complete cds 


4e-05 


120240 


FLAGELLIN B2 PRECURSOR 
Methanococcus voltae 
>gi|15O063 (M7214S) tlagellin 


' 5.2 


1246 


AF012899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


3e-05 


<NONE> 


<NONE> 


<NONE> 


1247 


U76524 


Sambucus nigra 
ribosome inactivating 
jrotein precursor 
mRNA. complete cds 


3e-05 J <NONE> 


<NONE> 


<NONE> 


1248 


AF0743S6 


Sambucus nieia 
ievein-like protein 
mRNA, complete cds 


3e-05 1 <NONE> 


<NONE> 


<NONE> 



WO 01/02568 
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Nearest Neighbor fBlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Rattus norvegicus 










1249 


AF093268 


homer- 1c mRNA, 
complete cds 


3e-05 


<NONE> 


<NONE> 


<NONE> 


1250 


ABO 12 106 


Brassica rapa mRNA 
for SRK45, complete 
cds 


3e-05 


2773226 


(AF0397I6) Similar to protein 
kinase [Caenorhabditis eleaans] 


6.7 


1251 


AF 100694 


Mus muse u I us 
Pontird2 mRNA, 
complete cds 


3e-05 


207296! 


(U93568) putative pi 50 (Homo 
sapiensj 


5.6 


1252 


U72396 


Lycopersicon 
esculentum class II 
small heat shock 
protein Le-HSPl7.6 
mRNA. complete cds 


3e-05 


121855 


LXOGLWJANASE 11 
PRECURSOR cellulose 1.4-beta 
cellobiosidase (EC 3.2.1.91) II 
precursor - fungus (Trichoderma 
reesei) i,4-beta-cellobiosidase 
(EC 3.2.1.91) II - fungus 
cellobiohydrolase II 
[Trichoderma reesei) 


4,6 


1253 


U76524 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


3e-05 


3880516 


(AL021572) similar to CTP 
SYNTHASE (EC 6.3.4.2) (UTP- 
- AMMONIA LIGASE) (CTP 
SYNTHETASE) 


3.3 


1254 


M88299 


Mouse brain- 1 POU- 
domain protein, 
complete cds. 


3e-05 


1947048 


(U66102) intimin [Escherichia 
coli] 


3.0 


1255 


U9509S 


Xenopus laevis 

phosphoprotein 44 
mRNA, partial cds 


3e-05 


3122872 


CELL-CYCLE N'UCLEaR 

AUTOANTIGEN SG2NA 
(S/G2 NUCLEAR ANTIGEN) 
>gi|l082650|pir||JC2522 nuclear 
autoantitzen - human >ui|30j095 
(U179S9) GS2NA 


2.8 


1256 


U76524 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


3e-05 


1352145 


CYTOCHROME c oxidase 

POLYPEPTIDE I chain I - 
Thermus aquaticus >gi| 155083 
(M84341) cytochrome c oxidase 
subunits precursor [Thermus 
thermophilus] 


2.6 


1257 


U72396 


Lycopersicon 
esculentum class II 
small heat shock 
protein Le-HSPI 7.6 
mRNA. complete cds 


3e-05 


2811015 


SEGMENTATION POLARITY 

PROTEIN ENGRAILED 
>gi|2076747 (U42429) 
engrailed [Anopheles gambiae] 
>gi|214S9tS (U42214) 
engrailed [Anopheles gambiael 


2.0 
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Nearest Neighbor CBlastN vs. Genbank) 


| Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) | 


SEQ 
in 


ACCESSION 


DESCRIPTION 


P VALUE 


1 ACCESSION 


DESCRIPTION 


P VALUE 

1 " — 
















1258 


AF027174 


Arabidopsis ihaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 


3e-05 


1657752 


(U62325) FE65-like protein 
[Homo sapiens] 


1.7 


1 *l f c\ 

1259 


AFL 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


3e-05 


2072961 


(U93568) putative p 150 [Homo 
sapiens] 


1.5 


L260 


U76523 


Sambucus nigra lectin 
precursor mRNA, 
complete cds 


3e-05 


1352145 


CYTOCHROME C OXIDASE 

POLYPEPTIDE I chain I - 
Thermus aquaticus >gi[ 155083 
(M84341) cytochrome c oxidase 
subunits precursor [Thermus 
thermophilusl 


LI 


1261 


X91890 


H.sapiens regulatory 
region of HOXA7 
gene 


3e-05 


111013 


Sxr (Bkm-homolog) sex- 
determining region protein - 
mouse 


1.0 


1262 


L36936 


Homo sapiens metase 
gene, partial cds. 




3e-05 


1944352 


(D84239) IgG Fc binding 
protein [Homo sapiens) 


0.99 


1263 


AB012105 


Brass ica rapa mRNA 
for SLG45. complete 
cds 


3e-05 


417782 


SMP2 PROTEIN 
>gi|320853|pir||S30911 SMP2 
protein - yeast (Saccharomyces 
cerevisiae) gene 
[Saccharomyces cere vis iael 


0.89 


1264 


U76524 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


3e-05 


1703501 


INTEGRIN .ALPHA CHAIN- 
LIKE PROTEIN alpha Intlp 
[Candida albicans] 


0.39 


1265 


AF090115 


Lycopersicon 
esculentum cytosolic 
class 11 small heat 
shock protein HCT2 
(HSP17.4) mRNA. 
complete cds 


3e-05 


15S7031 


cis-Golgi matrix protein GM130 
[Rattus norveeicus] 


0.20 


1266 


ZS 1014 


Human DNA 
sequence from 
cosmid U65A4, 
between markers 
DXS366 and DXSS7 
on chromosome X * 


3e-05 


2072964 


(U93569) putative pl50 [Homo 
sapiens] 


0.049 



WO 01/02568 PCT7US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1267 



1263 



1269 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



Z96668 



ABO 12 105 



H. sapiens telomeric 
DNA sequence, clone 
7PTEL001,read 
7PTELOO001.seq 



Brassica rapa mRNA 
for SLG45, complete 
cds 



DESCRIPTION 



[glycosylated ana mynstilated 



P VALUE 



3e-05 



542429 



3e-05 



AF074385 



Sambucus nigra 
evein-n 
mRNA, complete cds 



1879121 



smaller suriace antigen - 
Plasmodium falciparum 
>gi|S36640 (X76298) 
glycosylated and mynstilated 
smaller surface antigen gallus] 
>gi|I092178|prfl|2023165B 
surface antigen I 0.029 

(Z70310) predicted using 
IGenefinder; Similarity to Mouse 
ankyrin (PER Acc. No. S37771); 
cDNA EST EMBLT0I923 
(comes from this gene; cDNA 
EST EMBL:D32335 comes 
from this gene; cDNA EST 
EMBL:D32723 comes from this 
gene; cDNA ES... Gcnefinder; 
Similarity to Mouse ankyrin 
(PIR Acc. No. S37771); cDNA 
EST EMBL:T01923 comes 
from this gene; cDNA EST 
EMBL:D32335 comes from this 
gene; cDNA EST 
EMBL:D32723 comes from this 
eene;cDNA ES... I 2e-l 



3e-05 



2497677 



|ZYXIN(2YXIN2) sapiens] 

i4|gnll PlDle223417 
KX95735) zv.xin 



2e-23 



1270 



AF027174 



Arabidopsis thai i ana 
cellulose synthase 
catalytic subunit (Ath 
B) mRNA, complete 
cds 



le-05 



<NONE> 



<NONE> 



<NONE> 



1271 



X1631S 



Canine mRNA for 
sianal recognition 
particle 54k protein 



le-05 



3122612 



PITUITARY HOMEOBOX 3 
(HOMEOBOX PROTEIN 
PITX3) >gi|2645427 
(AF005772) homeobox protein 
\P\ix3 fMus musculus] 



4.4 



1272 



AB012105 



Brassica rapa mRNA 
for SLG45. complete 

:ds " 



le-05 



1652458 



(D90905) DNA mismatch repair; 
protein MutL (Synechocystis 
lsp-1 



'0.62 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins) I 


5EQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 








1 






1273 


U57843 


Human 

phosphatidylinositol 
3-kinase delta 
catalytic subunit 
mRNA. complete cds 


le-05 


475909 


(X67098) ORFlA(Homo 
sapiens! 


0.22 


1274 


296569 


H. sapiens tetomeric 
DNA sequence, clone 
2QTEL054. read 
2QTELOO054.seq 


Ie-05 


2137043 


unknown protein - rabbit 
(fragment) cuniculus] 


0.005 


1275 


AE000810 


Methanobacterium 
thermoautotrophicum 
from bases 172512 to 
1S2957 (section 16 of 
148) of the complete 
genome 


ie-l)j 


i 

i 

• 

3o77j7v 


kinensin-like protein KIF4 
(SW:P33l74);cDNA EST 
EMBL:D27320 comes from this 
sene; cDNA EST 
EMBL:D27322 comes from this 
gene; cDNA EST 
£MBL:D2732I comes from this 
gene; cDNA EST 
EMBL:D35764 comes... Mouse 
kinensin-like protein KIF4 
(SW:P33174);cDNA EST 
EMBL:D27320 comes from this 
gene; cDNA EST 
EMBL:D27322 comes from this 
gene; cDNA EST 
EMBL:D27321 comes from this 
gene; cDNA EST 
EMBL:D35764 comes... 


6e-27 


1276 


AB012113 


Homo sapiens gene 
for CC chemokine 
PARC precursor, 
complete cds 


9e-06 


<NONE> 


. <NONE> 


<NONE> 


L277 


AC005S30 


Homo sapiens XpJ'i- 
154-155 B AC GSHB- 
52411 (Genome 
Sv stems Human BAC 
Library), complete 
sequence [Homo 
sapiens] 


9e-06 


<NONE> 


<NONE> 


<NONE> 


127S 


D36245 


Human MHC (HLA) 
DRB intron 1 DNA, 
partial sequence 


9e-06 


1051253 


(U37531) mucin apoprotein 
[Mus musculus) 


1.3 


1279 


D79998 


Human mRNA for 
KIAA0176 gene, 
partial cds 


9e-06 | 


2833253 


HYPOTHETICAL PROTEIN 
KIAA0176 sapiensl 


4e-06 I 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neiehbor (BiastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












(Z^yo-Oj iimuanty to V east 




1280 


U 10246 


Toxoplasma gondii 
RH uracil 
phosphoribosyl 

complete cds. 


9e-06 


3876090 


undine &nase 

(SW:URK1.YEAST); cDNA 
EST EMBL:Z 14695 comes 
from this gene; cDNA EST 
CEMSE 17F comes from this 
gene; cDNA EST 
EMBL:D67355 comes from this 
gene, LL/i ^ c\ no i yR-iuyn i.j 
comes from this °e... 


7e-33 


1231 


U 10246 


Toxoplasma gondii 
RH uracil 
phosphoribosyl 

complete cds. 


9e-06 


3876090 


(£oyoj:>) Similarity to Yeast 
uridine kinase 

(S W.URK 1_ YEAST); cDN A 
ESTEMBL:Z14695 comes 
from this gene; cDNA EST 
CEMSE 17F comes from this 
gene; cDNA EST 
EMBL:D67355 comes from this 
gene, cuin rv cj i yitzuvn i .j 
comes from this se... 


7e-34 


1282 


AFO 12899 


Sambucus nisra 

ribn^nmp inncti valine 

protein precursor 
mRNA, complete cds 


Se-06 


<NONE> 


<NONE> 


<NONE> 


1283 


AFO 12899 


Sambucus nigra 
ribcKome inactivatin° 

mRNA. complete cds 


8e-06 


<NONE> - 


<NONE> 


~<{NUINfc> " 


1284 


U66340 


Human Rh blood 

Si l Ulip V_ ill 111 ~t 1 1 

(RHCE) gene, exon 
2. partial cds 


8e-06 


1707155 


(U80837) F07E5.6 gene product 
Caenorhabditis elesans] 


9.6 


1285 


AFO 12899 


Sambucus ni^ra 
ribosome inactivating 
Drotein precursor 
mRNA. complete cds 


7e-06 


<NONE> 


<NONE> 


<NONE> 


1286 


M29930 


-luman insulin 
receptor (allele 2) 
gene, exons 14, 15. 
16 and 17. 


4e-06 


<NONE> 


<NONE> 


<NONE> 


1287 


L42103 


-iomo sapiens 
; subclone 5_d3 from 
PI H25) DNA 
sequence. 


3e-06 


<NONE> 


-. <NONE> 


<NONE> 



1ST 



WO 01/02568 
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|| Nearest Neishbor 'BlastN vs. Genbank) 


Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 


seqI 

id j accession 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Mus musculus 










128S 


AFO 12244 


cerberus-like (Cer-1) 
gene, complete cds 


3e-06 


<NONE> 


<NONE> 


<NONE> 




1289 


269366 


Human DNA 
sequence from 

L.UMI11U L/Ul O, 

Huntington's Disease 
Region, chromosome 
4pio.j contains coi. 


JC-UO 








1290 


Z69366 


Human DNA 

c**rt p ipnrp From 

cos mid L96F3, 
Huntinaton's Disease 
Reaion. chromosome 
4pl6.3 contains EST. 


3e-06 


<NONE> 


<NONE> 


<NONH> 


1291 1 XS5232 


H.sapiens 
chromosome 3 
sequences 


3e-06 


<NONE> 


<NONE> 


<NONE> 


1292 


M32674 


Human platelet 
glycoprotein Ilia, 
exons 7. 8 and 9. 


3e-06 


<NONE> 


<NONE> 


<NONE> 


1293 


D 16879 


Human HepG2 3' 
region cDNA. clone 
hmd2a01 


3e-06 


998296 


(U33484) ependymin 
[Hemiodus sp.] 


,e 


1294 


U18614 


Lagothnx lagotncha 
interphotoreceptor 
retinoiu-oinaing 
protein (IRBP) gene, 
intron 1, complete 
sequence 


Je-Uo 


1 0 1 Jo40 


(U71440) polyprotein [Rice 
tungro spherical virus] 


■ 


1295 


AF090115 


Lycopersicon 
esculenium cytosolic 
class II small heat 

chrv*L" nrnrpin WC'T' 

(HSP17.4) mRNA, 
complete cds 


3e-06 


1477646 


(U53 9 04) olectin fHomo 
sapiens] >gi|1477651 (U63610) 
plectin [Homo sapiens] 


4.0 


1296 AFO 16898 


Homo sapiens B-ATF 
gene, complete cds 


3e-06 


1085177 


reverse transcriptase - fruit fly 
reverse transcriptase 
[Drosophila yakuba] 


3.0 


1297 


ABO 1 8490 


Homo sapiens DNA, 
trinucleotide receats 
reaion 


3e-06 


3S76572 


(28 1522) predicted using 
Genefinder; similar to RNA 
recognition motif, (aka RRM, 
RBD, or RNP domain) 
[Caenorhabditis elegans] 


3.0 



13? 
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Nearest Neiehbor f BlastN vs. Genbank) 


Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















129S 


AF027174 


Arabidopsis thai i ana 
cellulose svnihase 
catalytic subunit (Ath- 
B) mRNA. complete 
cds 


3e-06 


4240137 


(AB020631) KIAA0824 protein 
[Homo sapiens! 


2.7 


1299 


M37929 


Homo sapiens 
adenosine 
monophosphate 
deaminase 1 
(AMPD1) gene, 
exohs 1 1-12. 


3e-06 


■ 1653775 


(D90916) thiohdisulfide 
interchange protein DsbD 
[Synechocystis sp.] 


1.7 


1300 


M37929 


Homo sapiens 
adenosine 
monophosphate 
deaminase 1 
(AMPD1) gene, 
exons 1 1-12. 


3e-06 


1653775 


(D90916) thiokdisulfide 
interchange orotein DsbD 
(Synechocystis sp.] 


1.7 


1301 


U60496 


Glycine max actin 
(SoyS6) sene. partial 
cds 


3e-06 


1730738 


ACT1N-LIKE PROTEIN ARP5 
Ynl2430o fSaccharomvces 
cerevisiae] 


2e-05 


1302 


X14363 


Yersinia 

pseudotuberculosis 
rplC, rplD.rplW, 
rplB and rpsS genes 
for ribosomul proteins 
L3, L4, L23, L2 and 
S19 


3e-06 


585879 


50S RIBOSOMAL PROTEIN 
L2 maritima >gi|437926 
(Z21677) ribosomal protein L2 


2c- 12 


1303 


234969 


microsatellite 
polymorphism 


2e-06 


<NONE> 


<NONE> 


<NONE> 


1304 


X64707 


H.sapiens BBC1 
mRNA 


te-06 


<NONE> 


<NONE> 


<NONE> 


1305 


AC005S30 


Homo sapiens Xpl2- 
154-155 BAC GSHB- 
52411 (Genome 
Systems Human BAC 
library), complete 
sequence tHomo 
sapiens] 


le-06 


<NONE> 


<NONE> 


<NONE> 


1306 


J04058 


-luman electron 
transfer ftavoprotein 
alpha-subunit mRNA. 
complete cds. 


le-06 


<NONE> 


<NONE> 


<NONE> 



IS? 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


UcoLKLr L lUlN 


P VALUE 


ACCESSION 


Ufcoi-KLr I IUN 


P VALUE 














} = 


1307 


I L25647 


Homo sapiens 
fibroblast growth 
factor receptor gene 
(located in the central 
MHO signal peptide 
and consecutive exon 


le-06 


1586734 


mxcQ gene [Methylobacterium 
organophilum) 


5.4 


1308 1 L26261 


Human MHC class III 
HLA-RP1 gene. 


le-06 


1684985 


(U20633) NADH 
dehydrogenase subunit 
[Neuwiedia veratrifclia] 




1309 


AF0022S3 


Mus musculus alpna- 
actinin-2 associated 
LIM protein mRNA, 
alternatively spliced 
product, complete cds 


le-06 


2996196 


(AF053367) carboxyl terminal 
LIM domain protein [Mus 
musculusl 


4e-l7 


1310 


M 10935 


Human haptoglobin 
gene (alpha-2 allele), 
complete cds and 
haptoglobin-related 
sene, exon 1 and 
three Alu repeats. 


6e-07 


<NONE> 


<NONE> 


<NONE> 


1311 


AC0O2251 


Homo sapiens 
(subclone l„g6 from 
BAC H76) DNA 
sequence 


4e-07 J 2144491 


coagulation factor Xa (EC 
3.4.21.6) precursor norvegicusl 


4.2 


1312 


AF047717 


Streptomyces 
chrvsomallus 
actinomvein 

* 

synthetase II (acmB) 
gene, complete cds 


4e-07 


699196 


(U 15 1 81) 4-coumarate-coA 
iaase [Mycobacterium leprae] 


le-06 


1313 


U 14417 


Human Ral guanine 
nucleotide 
dissociation 
stimulator mRNA, 
partial cds. 


4e-07 


544402 


OllAiNLVh NULLhOl uJt . 
DISSOCIATION 
STIMULATOR RALGDS 

rUKM A ^KAJLOfcr) 

>gi|321257|pir||S2S4!5 guanine 
nucleotide dissociation 
stimulator ralGDS - mouse 
>gi| 193573 (L07924) guanine 
nucleotide dissociation 
stimulator [Mus musoulusl 


Se-OS 


1314 


279027 


H. sapiens flow-sorted 
chromosome 6 
HindlU fragment. 
SC6pA20G3 


3e-07 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor f BlastN vs. Gen bank) 



1315 U67167 



1316 AF086256 



1317 U67228 



1 3 1 S U94346 



1319 Y15724 



1320 XI 3596 



132: 



325 



ACCESSION DESCRIPTION P VALUE ACCESSION 



Nearest Neighbor (B las lX vs. Non-Redundant Proteins) 



1322 Z55905 



X0354 1 



1324 AF027766 



D13613 



Homo sapiens 



intestinal mucin 
(MUC2) gene, 
promoter region and 
partial cds 



Homo sapiens full 
length insert cDNA 
clone ZD41CU 



Human clone HS4.61 
Alu-Yai sequence 



Human calpain-like 
protease (hcra-3) 
mRNA. complete cds 



Homo sapiens 
SERCA3 sene. exons 
1-7 (and joined CDS) 



Bean DNA tor 
Hlvcine-rich cell wall 
protein GRP 1 .8 



-lomo sapiens 
cytosolic selenium- 
dependent glutathione 
Deroxidase gene, 

1321 M83094 rhoh!2eene. 3' end 



-I.sapiens CpG DNA, 
clone 71f4, forward 
read cpg7 lf4.fr la . 



-luman mRNA of irk 
oncogene > :: 

b|I96186|I961S6 
Sequence 23 from 
patent US 5734039 



Canis familiaris Y- 
nked zinc finger 
protein 



Jovine mRNA tor 
rabphilin-3A, 
complete cds > :: 
bj|E07809|E07809 
DNA encoding 
abphilinoA 



DESCRIPTION 



3e-07 



<NONE> 



3e-07 



<NONE> 



3e-07 



1938437 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(U97003) contains similarity to 
C4-type zinc fingers and a 
ligand-binding domain of 
nuclear hormone receptors 



3e-07 



2911858 



le-07 



<NONE> 



le-07 



<NONE> 



le-07 



1326385 



le-07 



1076802 



le-07 



325465 



le-07 



220643 



le-07 



2822161 



(AF047659) No definition line 
found [Cae norhabditis eleaans] 



0.39 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(U5875l)C07GL7gene 
roduct r Caenorhabditis 
eleaans] 



extensin-like protein - maize 
>gi|600118 mays] 



0.61 



[M74509) [Human endogenous 
retrovirus type C oncovirus 
sequence.], gene product (Homo| 
sapiens] 



3e-04 



[D1062S) zinc finger protein 
Mus musculus] 



7e-08 



AC004082) rub3 effector-like; 
35% Similarity to AF007336 

PID:g23 17778) [Homo 
sapiens) 



6e-Il 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor iBIastX vs. Non-Redundant Proteins) 


SEQ 

ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Human mRNA for c- 






(J04169) gag-one fusion protein 




1326 


X57U0 


cbl proto -oncogene 


le-07 


323270 


[Cas NS1 retrovirus] 


3e-l4 


1327 


X57110 


Human mRNA forc- 
cbl proto-oncogene 


le-07 


115855 


PROTO-ONCOGENE C-CBL 
human >gi|2973 1 (X:>7 1 10) c- 
cbl protein [Homo sapiens] 


4e-l9 


1328 


AC001178 


Homo sapiens 
(subclone 2_g 12 from 
BAC H94) DNA 
sequence 


4e-08 


1 <NONE> 


<NONE> 


<NONE> 


1329 


U11866 


Human interleukin-8 
receptor type B 
(ILSRB) gene, 
promoter and exons l- 
6 


4e-08 1 <NONE> 


<NONE> 


<NONE> 


1330 


AC001225 


Homo sapiens 
(subclone 2_e6 from 
BAC H94) DNA 
sequence 


4e-03 


478184 


histone HI IM (clone L95) - 
midae 


« 


1331 


M73837 


Human modulator 
recognition factor 2 
(MRF-2) mRNA, 
complete cds. 


4e-08 


141448 


HYPOTHETICAL j2.6 KD 
PROTEIN IN TRANS POS ON 
TN4556 >gt|80758|ptr||JQ042S 
hypothetical 32. 6K protein - 
Streptomyces fradiae transposon 
Tn4556 


4.7 


1332 


AC006164 


Homo sapiens clone 
UWGC;y28gap from 
6p2 1, complete 
sequence (Homo 
sapiens] 


4e-0S 


2580578 


(AF000996) ubiquitous TPR 
motif, Y isoform [Homo 
sapiens] 


1.2 


1 j jj 


X01060 


Human mRNA for 
transferrin receptor 


4e-0S 


135514 


T-CELL RECEPTOR BETA 
CHAIN PRECURSOR 
precursor-(ANA 11)- rabbit 


0.61 


1334 


Y10697 


H.sapiens INE2 
mRNA 


4e-0S 


124909 


INSULIN RECEPTOR- 
RELATED PROTEIN 
PRECURSOR (IRK) (ER- 
RELATED RECEPTOR) 
>gi|lS6555 sapiens] 


0J4 


1335 


U60416 


Rattus norvegicus 
myr 6 myosin heavy 
chain mRNA. 
complete cds 


4e-0S 


102189 


myosin I. high molecular weight 
- Acanthamoeba sp 


3e-0S 
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Nearest Neiehbor (BiastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins') 


fD 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












HYPO I'Hh IILAL 35.JXD 




1336 


U'23804 


Drosophila 
melanogaster putative 
GTP-binding_ 
regulatory protein 
beta chain (GPB) 
mRNA, partial cds. 


4e-08 


2494916 


TRP-aSP REPEATS " 

CONTAINING PROTEIN 
T10F2.4 IN CHROMOSOME 
III protein; similar to G-Beta 
repeal region (Trp-Asp 
domains) of guanine nucleotide 
binding protein 


le-28 . 


1337 


AE000213 


Escherichia coli K-12 
MG 1655 section 103 
of 400 of the 
complete genome 


4e-08 


. . 3294172 


(AL022325) tF27C3.l.l 
(protein similar to C. elegans 
protein B0035. 16) (isoform I) 
[Homo sapiens] 


2e-67 


1338 


D8982 I 


Mus musculus mRNA 
for RhoVI, complete 
cds 


2e-08 


3024539 


RHO-RELATED GTP- 
BINDING PROTEIN RHOD 
(RHO-RELATED PROTEIN 
HP I) (RHOHPl) sapiens] 


le-04 


1339 


U74382 


Human telomeric 
repeat DNA-binding 
protein (PIN2) 
mRNA, complete cds 


le-08 


<NONE> 


<NONE> 


<NONE> 


1340 


L35657 


Homo sapiens 
(subclone HS 5_al0 
from PI 35 H5 C8) 
DNA sequence. 


le-08 


<NONE> 


<NONE> 


<NONE> 


1341 


L21936 


Human succinate 
dehydrogenase 
flavoprotein subunit 


le-08 


3201678 


(AF060SS6) adenine 
phosphoribosyltransferase 
[Leishmania tarentolae] 


4.0 


1342 


AB009777 


Mrtmn ippjprK gene 
for osteonidogen, 
promoter region 


le-08 


479388 


tritin - wheat 

>ai|39 1929|enI|PID|d 1003454 


2.2 


1343 


M58600 


Human heparin 
cofactor II (HCF2) 
gene, exons I through 
5. 


le-08 


1730173 


GLUCOSE-6-PHOSPHATE 
ISOMERASE, CYTOSOLIC 2 
(GPI) (PHOSPHOGLUCOSE 
ISOMERASE) (PGI) isomerase 
IClarkia concinna] 


1.9 


1344 


M5S600 


Human heparin 
cofactor II (HCF2) 
aene, exons 1 throush 
5. 


le-08 


1730173 


GLUCOSE-6-PHOSPHATE 
ISOMERASE. CYTOSOLIC 2 
(GPI) (PHOSPHOGLUCOSE 
ISOMERASE) (PGI) isomerase 
[Clarkia concinna] 


L7 


1345 


AC000980 


-lomo sapiens 
[subclone l_s2 from 
PI H3I) DNA 
sequence 


le-08 


439S77 


(L2742S) reverse transcriptase 
[Homo sapiens] 


1.1 
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Nearest Neighbor ( BlastN vs. Genbank) | Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1346 


I U48734 


Human non-muscle 
alpha- actinin mRNA, 
complete cds 


Le-08 


168237 


(M76546) hydroxyproline-rich 
protein [Helianthus annuus] 


0.19 


1347 


M76724 


Human leukocyte 
adhesion receptor 
alpha subunit 


le-08 


1 177607 


(X92485) pval [Plasmodium 
vivax] 


0.19 


1348 


AF067959 


Gallus gallus 
homeodomain protein 
HOXD-3 mRNA, 
complete cds 


Le-08 


• . 3165574 


(AF067942) No definition line 
found [Caenorhabditis elegans] 


0.15 


1349 


Z810L4 


Human DNA 
sequence from 
cosmid U65A4, 
between markers 
DXS366 and DXS87 
on chromosome X * 


le-08 


2072964 


(U93569) putative pI50 [Homo 
sapiens] 


0.001 


1350 


X57103 


Human h-lvs 2ene for 
lysozyme (upstream 
region) 


7e-09 


<NONE> 


<NONE> 


<NONE> 


L35 1 


AFO 12899 


Sambucus niara 
ribosome inactivatine 
protein precursor 
mRNA, complete cds 


7e-09 


23 1629 


BILE-SALT-ACTIVATED 
LIPASE PRECURSOR ESTER 
LIPASE) (STEROL 
ESTERASE) (CHOLESTEROL 
ESTERASE) salt-activated 
lipase [Homo sapiens] sapiens] 


0.22 


1352 


L34741 


Aplysia californica 
prohormone 
convertase (PC2) 
mRNA, complete cds. 


5e-09 1 322054 


cytochrome-c oxidase (EC 
1.9.3.1) chain II precursor - 
Synechocystis sp. (PCC 6803} 
>gi|581739 sp.] 


5.0 


1353 


AF052959 


-{omo sapiens type 
XV collaeen 
{COL L5A1) gene, 
exon 6 


4e-09 


131269 


PHOTOS YSTEM II P680 
CHLOROPHYLL A 
APOPROTEIN (CP-47 
PROTEIN) 

>gi|7270S|pir||QJLV6A 
photosystem II chlorophyll a- 
binding protein psbB - liverwort 
(Marchantia polymorpha) 
chloroplast >gi|l 1700 


IS 
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Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 




Nearest Neighbor (BlastX vs. Nort- Redundant Proteins) 



DESCRIPTION 



(AOMATINE — 

UREOHYDROLASE) (AUH) 
(PROCLAVAMLNIC ACID 
AMID [NO HYDROLASE) 
>gi|l361423|pir||S57669 
Prociavaminic acid amidino 
hydrolase - Streptomyces 
clavuiigerus >gi|295I71 
Prociavaminic acid amidino 
hydrolase (Streptomyces 
clavuiigerus] 

>gi|l586l22|prf1|2203286B 
prociavaminic acid amidino 
hydrolase (Streptomyces 
:iavuligerusl 



P VALUE 



GENERAL SECRETION 
PATHWAY PROTELN L 
product (Klebsiella pneumoniae 
>gi|14931t (M32613) pulL 



<NONE> 



AB002302) KIAA0304 (Homo 
sapiens] 



(U5865S) unknown (Homo 
sapiens 1 



4e-I3 



2.5 



<NONE> 



5.0 



1.3 



!!!! ALU CLASS F WARNING 
ENTRY !!!! 



0.72 
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| Nearest Neighbor (BiastN vs. Gen bank) 


I Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


1 ACCESSION 


DESCRIPTION 


P VALUE 














1360 


AF002283 


Mus musculus alpha* 
actinin-2 associated 
LIM protein mRNA, 
alternatively spliced 
product, complete cds 


6e-10 


2996196 


(AF053367) carboxyl terminal 
LIM domain protein [Mus 
musculusl 


4e-2l 


1361 


M26220 


African green 
monkey origin of 
replication 


5e-l0 


2143455 


gene DMR-N9 protein - mouse 
(fragment) 


8.8 


1362 


Z78006 


H.sapiens flow-sorted 
chromosome 6 
Hindlll fragment, 
SC6pA7F10 


4c- 10" 


2072977 


(U93574) putative pl50 [Homo 
sapiens] 


0.005 


1363 


U82303 


Homo sapiens 
unknown protein 
mRNA, partial cds 


2e-l0 


182571! 


(USS 1 83) similar to the 
immunoglobulin superfamily, 
most similar to nerual cell 
adhesion proteins 
[Caenorhabditis elegans) 


0.031 


1364 


AF079764 


Drosophila 
melanomas ter 
enhancer of 
polycomb 


2e-l0 


3757890 


(AF079764) enhancer of 
polycomb [Drosophila 
melanogaster] 


le-10 


1365 


L24123 


Homo sapiens NRF1 
protein (NRF I) 
mRNA. 


2e-10 


3004573 


(AC004520.) similar to NFE2- 
related transcription factors; 
similar to 148694 
(PID:g2 137676) (Homo 
sap ten si 


4e-53 


1366 


M91454 


Orangutan alpha- 
globin gene duplicate 
resion. 


le-10 


> 

464239 


NADH- UBIQUINONE 
OXIDOREDUCTASE CHAIN 
4>gi|l0S5lS5|pir[|S5296S 
NADH dehydrogenase chain 4 - 
honeybee mitochondrion 
(SGC4) >2i|552446 (L06178) 
NADH dehydrogenase subunit 4 
[Apis mellifera iigustica] 


6.0 


1367 


DS7117 


-louse mouse; 
Vtusculus domesticus 
jrain mRNA for 
SAP 102, complete 
cds 


6c 1 1 


473912 


(L31961) phosphoprctein [Mus 
cookii | 


2-2 


136S 


AC001002 


-lomo sapiens 
(subclone 2_h9 from 
PI H39) DNA 
sequence 


5e-ll 


<NONE> 


<.\ON£> 


<NONE> 1 



WO 01/02568 



PCT7US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



1369 



1370 



1371 



1372 



ACCESSION 



DESCRIPTION 



P VALUE 



[Homo sapiens 
(subclone 2_h9 from 
PI H39) DNA 
AC0010Q2 seq uence 



Nearest Neighbor (Bias tX vs. Non-Redundant PrnfPinO 



ACCESSION 



5c- 11 



Homo sapiens 
KIAA04I4 mRNA. 
AB007 374 partial cds 

[Homo sapiens 
(subclone 2_h9 from 
PI H39) DNA 
AC001002 sequence 



5c- 11 



5e-U 



[Homo sapiens 
[(subclone 2_h9 from 
PI H39) DNA 
AC001002 [sequence 



5e-ll 



Horno sapiens 
(subclone 2_h9 from 
PI H39) DNA 
13731 AC0010Q2 (sequence 



5e-ll 



[Homo sapiens 
(subclone 2_h9 from 
PI H39) DNA 
1374 1 ACQ01QQ2 sequence 



5e-ll 



H. sapiens mRNA for 
IHERV-K long 
1375 1 Z21352 terminal repeat 



5e-ll 



Homo sapiens mRNA 

J Id KIAA0459 

1376 j AB007923 [protein, partial cds | 5e-ll 

House mouse; 
|Musculus domesticus 
brain mRNA for 
IS API 02, complete 
t377 1 DS71 17 Icds I 5 e -U 



Homo Sapiens DNA 
sequence between 
two AiVflL I gene 
13731 AJ1315Q1 promoters. 6423 BP | 5e-ll 



Human endogenous 
retroviral protease 
1379| M27326 [mRNA, complete cds.l 5e- 1 1 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



419481 



2947238 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



gag polyprotein - human 
endogenous virus S71 



[Homo sapiens] 



473912 



728831 



8855S 



P VALUE 



(L31961) phosphoprotein [Mus 
cookii] 



!H! ALU SUBFAMILY J 
WARNING ENTRY 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



4.6 



2.3 



retroviral proteinase- 1 ike protein 
human 



l.S 



0.20 



0.002 

\ 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



1 . Nearest Neighbor f BlastN vs. G enbank) I Nearest Neighbor (BlastX vs. Non-Redundant Proteins" 



ACCESSION 



DESCRIPTION 



P VALUE 



380 



1381 



Drosophila 

melanogaster putative 
GTP-binding 
regulatory protein 
beta chain (GPB) 
U23304 mRNA. partial cds. 



M.musculus troponin 
Z22734 I aene. 



1382 



1383 



Homo sapiens 
KIAA0420 mRNA, 
AB007880 [complete cds 



5e-ll 



3e-ll 



9 Homo sapiens B AX 
gene, exon 6, partial 
AF020361 [sequence 



1384 



Homo sapiens DNA 
L35600 [sequence. 



1385 



1386 



Human orsanic anion 

I 

transporting 
U21943 polypeptide 



2e-ll 



2e-ll 



2e-ll 



1387 



Homo sapiens 
carboxyl terminal 
U90878 LIM domain protein 



U31929 



1383 



M25828 



1389 I AB02064S 



1 390 1 215026 



Human orphan 
nuclear receptor 
(DAXl)gene, 

complete cds 

Human von 

Willebrand factor 
gene, exon 1. 2, and 
3, and three Alu 
repetitive elements. 



2e-ll 



2e-Il 



6e-l2 



Homo sapiens mRNA 
for KIAA0S41 
protein, partial cds 



1391 



1392 



L28101 



Z47046 



H.sapiens genes for 
tumor necrosis factor 
Tnfa) 2nd 
ymphotoxine (Tnft) 



6e-I2 



3e-I2 



Homo sapiens 
kallistatin (PI4) gene, 
exons 1-4, complete 
cds 



2e-12 



2e-12 



Human cosmid 
QLL2C9 from Xq?S 



2e-l2 



ACCESSION 



DESCRIPTION 



24949 1 6 



3892202 



<NONE> 



<NONE> 



1174952 



2738223 



HVPOlHii ' llCAL :o.2KD 
TRP^SFREPEaTS 

CONTAINING PROTEIN 
TI0F2.4 IN CHROMOSOME 
III protein; similar to G-Beta 
repeat region (Trp-Asp 
domains) of guanine nucleotide 
binding protein 



P value! 



Ie-30 



(AF072889) transcription 
repressor brain factor 2 



0.053 



<NONE> 



<NONB> 



<NONE> 



<NONE> 



GLYCOPROTEIN D 
PRECURSOR gD [Bovine 
lerpesvirus i] 



0.25 



[U9501 1) brain-specific organic 
anion transporter | 9e-19 



2996196 



(AF053367) carboxyl terminal 
LIM domain protein [Mus 
musculus] 



4e-23 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<N0NE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 




13931 279007 



J 394 1 U34377 



H. sapiens flow-so rtec 
chromosome 6 
HindUI fragment, 
SC6pA20E2 
Human tyrosine 
kinase TXK (txk) 
gene, exon 13. 



1395 D70845 



1396 M63978 



Mus musculus apg- 1 
gene for novel 
member of heat shock 
protein 1 10, promoter 
region 



Human vascular 
endothelial growth 
factor gene, exon 8. 



1397 U60266 



1398 268297 



13991 268297 



Homo sapiens 

lysosomal alpha- 

mannosidase (manB) 

[ rnRNA. complete c ds 
Caenorhabditis 

lelegans cosmid 

Fl IA10. complete 

sequence 

[Caenorhabditis 

[e legans] 

Caenorhabditis 

IF11A10, complete 

sequence 

[Caenorhabditis 
leleeansl 



2e-l2 



le-12 



106322 



151484 



hypothetical protein (LIH 3* 
region) - human 

(M55524) ORE 4; putative 
fPseudomonas aeruginosa! 



Ie-12 



113658 



le-12 



3982737 



3e-13 



<N0NE> 



1400 
1401 



263385 
X76104 



(Human DNA : : 
sequence from 
cosmid L21F12B, ' 
Huntington's Disease 
Region, chromosome 
4pl6.3, contains 
EST. 

iH.sapiens DAP- 
kinase raRNA 



1402 27S668 



H. sapiens flow- sorted 
chromosome 6 TaqI 
fragment, 

SC6pA13G4 



7e-I3 



2393734 



ALKALINE PROTEINASE 
PRECURSOR (ALP) precursor 
fungus (Acremonium 
chrysogenum) 



<AF069731)calmodulin- 
dependent protein kinase II beta 
M isoform [Rattus norvegicus] 



7e-I3 



2393734 



6e-13 
6e-I3 



<NONE> 
2911154 



1403 L35600 



Homo sapiens DNA 
| sequence. 



5e-!3 



106322 



3c- 13 



3134290 



<NONE> 

(AC002542) similar to C. 
elegans Fll A 10.5; 80<*> 
similarity to 268297 
(PID:gi 130619) [Homo 
sapiens) 

02542) similar to C . 
elegans Fl 1A10.5; $09c 
similarity to 268297 

(PID:gI 1306 19) [Homo 
sapiens] 



1.5 



4.3 



3.5 



0.083 



<NONE> 



5e-34 



<NONE> 
(AB007143) 2Q°-kinase [Mus 
musculus] 



hypothetical protein (LIH 3' 
region) - human 



AC004I36) hypothetical 
protein [Arabidopsis thalianal 



3e-38 



<NONE> 



0.007 



2e-06 



1.7 



WO 01/02568 
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SEQ 
ID 



r 



Nearest Neighbor (BlasiN vs. Genbank) 



ACCESSION 



1404 



AF090452 



DESCRIPTION 



Cloning vector 



pKODT complete 
sequence 



14051 D28126 



1406 AF005219 



1407 



ABOI8301 



Human gene for ATP 
synthase alpha 
subunit, complete cds 
(exon I to 12) 



Nearest Neighbor ( BlastX vs. Non-Redundant Proteins ) 



P VALUE 



ACCESSION 



Homo sapiens 
transcription factor 
HOXD 1 3 



1408 



D70845 



1409 AG000691 



14101 D30785 



1411 



U32710 



1412 



1413 



AG000886 



Z62664 



1414 



ABO 14532 



Homo sapiens mRNA 
for KIAA0758 
protein, partial cds 



Mus musculus apg-1 
gene for novel 
member of heat shock 
protein 1 10, promoter 
region 



Homo sapiens 
genomic DNA, 2lq 
region* clone: 
T17IBG33 



Mouse mRNA for 
neuropsin, complete 
cds 



Haemophilus 
influenzae Rd section 
25 of 163 of the 
complete genome 



-tomo sapiens 
genomic DNA. 21q 
region, clone: 
64E11X19 



-Lsapiens CpG DNA, 
clone 7 Id 1 1, forward 
ead cpg71dl l.ftla . 



iomo sapiens mRNA 
or KIAA0632 
jrotein, partial cds 



2c- 13 



2e-13 



2e-13 



2e-'l3 



le-13 



8e-14 



8e-l4 



8e-I4 



7e-14 



7e-14 



3876730 



419481 



2822166 



3882237 



113658 



930045 



3559978 



4106673 



1363925 



DESCRIPTION 



P VALUE 



(Z49966) F35C11.4 
[Caenorhabditis elegansl 



7.3 



gag polyprotein - human 
endogenous virus S71 



(AB018301) KIAA0758 protein 
[Homo sapiens] 



ALKALINE PROTEINASE 
PRECURSOR (ALP) precursor 
fungus (Acremonium 
chrvsosenum) 



AJ005641) serine protease 
Rattus rattus] 



AL035064) queuine trna- 
ribosyltransferase 
Schizosacchaxomyces pombe] 



hypothetical protein 2 - North 
American opossum (fragment) 
>gi|89772l (248955) ORF-2, 
putative RT [Didelphis 
virginiana] 



3953461 



7e-14 



113668 



(AC002328) F20N2.6 
[Arabidopsis thaliana] 



!!!! ALU CLASS C WARNING 
ENTRY !!!! 



3.4 



(AC004080) transcription factor 
HOXA13 [Homo sapiens 1 | 5e-09 



le-23 



3.1 



(X15332) alpha- 1 (III) collagen 
Homo sapiensl | 3e-04 



2e-12 



2e-38 



Li 



0.085 



0.040 




WO 01/02568 
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Nearest Neighbor fBlastN vs. Gen bank) 


Nearesi Neighbor (BlastX vs. Non-Redundant Proteins) 


ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1415 


Z96473 


H.sapiens telomeric 
DNA sequence, clone 
20PTEL004, read 
20PTELOO004.seq 


7e-l4 


2931631 


(ABO 12223) ORF2[Canis 
familiaris] 


2e-04 


1416 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


4e-14 


<NONE> 


<NONE> 


<NONE> 


1417 


AF0 12899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


4e-14 


r t 

<NONE> 


<NONE> 


<NONE> 


1413 


AF033349 


Homo sapiens MLL 
gene breakpoint 
cluster region, intron 
1. partial sequence 


3e-l4 


72883 1 


!'.!! ALU SUBFAMILY J 
WARNING ENTRY 


9.3 


1419 


AC001526 


Homo sapiens 
(subclone 4_f6 trom 
PI H54) DNA 
sequence 


3e-14 


99861 


extensin - almond >gij20420 
(X65718) extensin 


9.2 


1420 


AF0 12899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


3e-I4 


728832 


!!!! ALU SUBFAMILY SB 
WARNING ENTRY 


0.15 


1421 


AF 100694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


2e-14 


3913573 


EPHRIN-A2 PRECURSOR 
tEPH-RELATCD RECEPTOR 
TYROSINE KINASE LIGAND 
6) (LERK-6) sapiens] 
>gi|292476l (AC004258) 
EPL6 HUMAN [Homo sapiens] 


8.7 • 


1422 


AF012S99 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


9e-15 


119040 


LIB PKUILLN, SMALL f- 
ANTIGEN (E1B 19K) 
>ai|74142|pir||QIAD25 early 
ElB 2 IK protein II - human 
adenovirus 5 >gi|584S9 
(X02996) mRNA 5 first reading 
frame [Human adenovirus type 
5] adenovirus type 5] 
>gi|209797 (J0 1969) 21 kD 
protein 


1.5 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor fBlasiN vs. Genbank) 


Nearest Neighb 


or (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












transcription factor GAT A-4, 




1423 


AF0I2899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


8c- 15 


477102 


retinoic acid-inducible - mouse 
>gi|293345 (M98339) GATA- 
binding transcription factor 
[Mus musculus] 


0.57 


1424 


AB012223 


Canis familiaris LINE 
1 element ORF2 
mRNA. complete cds 


8e-l5 


92385 


hypothetical protein - rat 
(fragment) 


0-003 


14251 AF 1 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


3e-l5 


<NONE> 


<NONE> 


<NONE> 


1426 


X 12433 


Human pHSl-2 
mRNA with ORE 
homologous to 
membrane receptor 
proteins 


3e-l5 


422532 


collagen alpha 3(IV) chain - sea 
urchin 


8.9 


1427 


AF0L2899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. ^?mplete cds 


3e-15 


1353143 


PROBABLE NUCLkAR 
HORMONE RECEPTOR 
E02H1.7 

>gi|387543 1 |snl|PID|e 1344980 
(Z47075) similar to Zinc finger, 
C4 type (two domains) 
[Cacnorhabditis elesans] 


5.0 


1428 


Z6965 1 


Human DNA 
sequence from 
cosmid L75B9. 
Huntington's Disease 
Region, chromosome 
4pl6.3 


3e-15 


403460 


(L24521) transformation-related 
protein [Homo sapiens! 


0.60 


1429 


AF012899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


2e-15 


108750 


Ig heavy chain precursor 
(B/MT.4A.17.H5.A5) - bovine 
>gi|440 (X62916) anti- 
testosterone antibody [Bos 
taurus] 


1.1 


1430 


XS3299 


H.sapiens SMA3 
mRNA 


2e-15 


671530 


(X83299) SMA3 gene product 
Homo sapiensl 


0.32 


1431 


U0I877 


Human p300 protein 
mRNA, complete cds, 
> :: gb|I62297|I62297 
Sequence I from 
patent US 565S7S4 


2c 15 


3024341 


E1A-ASSOCIATED PROTEIN 
P300 


0.019 



^7o 



WO 01/02568 
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ACCESSION 



DESCRIPTION 



1432 



Mouse MHC (Qa) Q2 
k gene for class I 
XI 65 16 I antigen, exons 4-8 



1433 1 M74165 



Chicken tensin 
[mRNA. complete cds 



1434| X7189: 



H.sapiens gene for 
immunoglobulin 
kappa light chain 
variable region 04 
and 05 



Human Rar protein 
1435 1 UQ.S7 27 ImRNA complete cds 



14361 M 2 3404 



Chicken erythrocyte 
anion transport 
protein (band3) 
mRNA, complete cds 



Rat mRNA for liver a 

L-Fucosidase (EC 
14371 X16145 h? I in 



143SI AF012899 



1439 AF0769SI 



Sambucus niara 
ribosome inactivating 
protein precursor 
mRNA, complete cd s 
Mus musculus brain 
mitochondrial carrier 
protein BMCP1 
(Bmcpl) mRNA, 
complete cds 



P VALUE 



Nearest Neighbor (BlastX vs. Non-RedundamPi^ 



ins) 



DESCRIPTION 



le-15 



2496897 



le-15' 



283920 



9e-l6 



9e-16 



3036779 



9e-16 



726403 



|HVPUlHhllL.AL4y 1 KdJ 

[chromosome ni 

>gi|3874384|gnl|PID|e 1344078 
EST EMBL:C08256 comes 
from this gene; cDNA EST 
EMBL:C09941 comes from this 
gene; cDNA EST yk340aL0.3 
comes from this gene; cDNA 
EST yk340al0.5 comes from 
this gene fCa... 

tensin - chicken >gi|2 12752 
KM74165) tensin " 



P VALUE 



7e-08 



2e-l9 



<NONE> I ■ <NONE> | <NONE> 

|(Z»44/y; match: multiple" 

proteins; match: 000407 

Q 12829 P22127 P36861 

Q40219; match: P70550 

Q41022 P22125 Q08155 

P35286; match: P5114S P51147 

P35293 P36861 P352S9; match: 

P35284 Q40217 P51152 

P51157 P51 158: match: Q41Q22I 3e-06 



9e-16 



67502 



[U23175) similar to anion 
exchange protein 

fCaenorhabditis elegans] 
aipha-L-t'ucosidase (EC 
3.2.1.51) 1 precursor, tissue - 
human >gi| 178409 (M29S77) 
alpha- L-fucosidase precursor 
(EC 3,2.1.5) [Homo sapiens] 



le-28 



8e-l6 



<NONE> 



8e-16 



3851540 



<NONE> 



(AF073544) brain mitochondrial 
carrier protein- 1 [Homo sapiens] 



2e-29 



<NONE> 



T 



2e- 1 3 



WO 01/02568 



PCT/US00/18374 



i 





Nearest Neishbor (BlastN vs. Genbank) 


Nearest Neighbor (BlasiX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






H.sapiens MN/CA9 






!!!! ALU SUBFAMILY J 




1440 


Z54349 


GENE 


5e~16 


72883 1 


WARNING ENTRY 


0.002 


1441 


AF077003 


Mus musculus SH3 
domain-containing 
adapter protein 
mRNA. complete cds 


3e-16 


309123 


(M35526) complement 
component C5D [Mus 
musculus] 




1442 


X64587 


■M. musculus mRNA 
for splicing factor 
U2AF (65 kD) 


3c- 16 


. 

2143767 


glycoprotein • rat >gi [986943 
(L08134) glycoprotein [Rattus 
norveeicusl norveaicus] 


0.003 


1443 


AB014561 


Homo sapiens rnRNA 
for KIAA0661 
protein, complete cds 


3e-16 


I i 

3327136 


(AB014561) KIAA0661 protein 
[Homo sapiens! 


le-20 


14 11 

1444 


Z739S7 


Human DNA 

*- 

sequence from 
cosmid N120B6 on 
chromosome 22 
Contains ESTs. 
complete sequence 
[Homo sapiens) 


ie-16 


<NONE> 


<NONE> 


<NONE> 


1445 


M5S31S 


Homo sapiens ala 
gene. 


le-16 


<NONE> 


<NONE> 


<NONE> 


1446 


U44103 


Human small GTP 
binding protein Rab9 
mRNA, complete cds 


le-16 


1552584 


(ZS0233) hypothetical protein 
Rv0029 


1.3 


1447 


AB014561 


^omo sapiens mRNA 
forKIAA0661 
protein, complete cds 


9e-17 


3327136 


(AB014561) KIAA0661 protein 
Homo sapiens] 


2e-20 


1448 


AF 100694 


Vlus musculus 
Pontin52 mRNA, 
complete cds 


le-17 


<NONE> 


<NONE> 


<NONE> 


1449 


M76762 


Vlus musculus 
ribosomal protein (Ke 
3) gene, cxons 1 to 5. 
and complete cds. 


le-17 


1073048 


pupR protein - Pseudomonas 
putida >2i|525260 


0.36 


1450 


D50561 


Human DNA. 
replication enhancing 
element (REE1) 


■ - . 
4e-18 1 


* 

126295 


LINE-1 REVERSE 

TRANSCRIPTASE 

HOMOLOG 


0.78 


1451 


D 1643 1 


-1 urn an mRNA for 
icpatoma-derived 
growth factor, 

to 

:omplete cds 


4e-l8 


3242079 


'AJ006984) proline-rich protein 


0.0 IS 



I 



WO 01/02568 



PCT/US00/18374 




SEQ 
ID 



ACCESSION 



DESCRIPTION 



P VALUE 



[Mus musculus heat 
shock protein hsp40-3 
14521 AF083983 1 mRN A. complete cds 



Human methyl sterol 
(oxidase (ERG25) 
1453| U6Q205 mRN A. complete cds 



Homo sapiens clone 
23899 mRN A 
1454 AF038177 sequence 



1457 



1458 



1459 



1460 



146! 



ACCESSION 



4e-18 



3e-18 



Homo sapiens mRNA 
for KIAA0661 
14551 ABQ14561 I protein, complete cds 



Homo sapiens mRNA 
for KIAA0661 
1456| ABO 1456 1 Iprotein, complete cds 



le-18 



le-IS 



Human tyrosine 
kinase TXK (txk) 
U34374 sene. exons 9 and 10 



lomo sapiens 
JhGAAl mRNA. 
AB006969 complete cds 



Human mRNA for 
KIAA0295 gene, 
AB002293 (partial cds 



Z59664 



H.sapiens CpG DNA, 
clone 16St9, reverse 
[read cpg!6Sf9.rt la 



M73S37 



Human modulator 
[recognition factor 2 

(MRF-2) mRNA. 
[complete cds. 



le-18 



le-19 



le-19 



le-19 



5e-20 



3873707 



<NONE> 



1360775 



3327136 



3327136 



<NONE> 



4151809 



2224531 



33 80251 



DESCRIPTION 



P VALUE 



(Z73102) Similarity to B-subtilis 
DNAJ protein 

(SW;DNAJ_BACSU); cDNA 
EST yk437al.5 comes from this 
gene [Caenorhabditis elegans] 



<NONE> 

G protein-coupled receptor 74 - 
equine herpesvirus 2 >gi|695246 
(U20824) G protein-coupled 
receptor [Equine herpesvirus 2] 



(ABO 14561) KIAA0661 protein 
[Homo sapiens] 



9e-25 



<NONE> 



5.1 



(ABO 14561) KIAA0661 protein 
[Homo sapiens] 



<NONE> 

(API 02855) synaptic SAPAP- 
interacting protein Synamon 



(AB002293) KIAA0295 [Homo 
sapiens] 



5e-20 



28431. 



(ZS2055) predicted using 
Gene finder 



modulator recognition factor 2 - 
human factor 2 [Homo sapiens] 



le-21 



Ie-22 



<NONE> 



0.19 



6e-17 



6.5 



0.019 



^7 3 



WO 01/02568 



PCT/US00/18374 



Nearest Neishbor (BlastN vs. Genbank) 

^ 



SEQ 

ID I ACCESSION 



1462 U24267 



1465 



1467 



DESCRIPTION P VALUE ACCESSION 



Nearest Neighbor fBlastX vs. Non-ReriunH.mt Proteins) 



Human pyrroline-5- 

carboxylate 

[dehydrogenase 



Mus musculus myelin 



Jgene expression 
1463|| U13262 factor 



Mus musculus mvelin 
gene expression 
1464| U 13262 factor 



H.sapiens CpG DNA. 
clone 48fl0, forward 
Z61239 readcps48fl0.ftU. 



1466 US9915 



|Mus musculus 
junctional adhesion 
Imolecule (Jam) 
mRNA. complete cds 



146S 



Gallus gallus p52 pro- 
apototic protein 
AF029071 mRNA. complete cds 



M25636 



Figure 4. Nucleotide 
sequence of the 
pKS36 1.797 kb 
insert. 



1469 



Homo sapiens mRNA 
for KIAA034S 
A8020655 [protein, complete cds 



5e-20 



4e-20 



4e-20 



4e-20 



le-20 



7e-22 



6e-22 



6e-22 



2506350 



536926 



3126873 



1669601 



3462455 



2599492 



1196398 



4240325 



DESCRIPTION 



P VALUE 



DEL1A-1W KKULlNb-5- 
C ARB O X YL ATE 
DEHYDROGENASE 
PRECURSOR (P5C 
DEHYDROGENASE) 
>gi|1353243 sapiens] 

gij 1353250 (U24267) pyrroline| 
5-carboxylate dehydrogenase 
[Homo sapiens] 
>gi|1589585|prf||2211355A 
Delta l-pyrroline-5-caxboxy late 
dehydrogenase [Homo sapiens] 



(U13262) myelin gene 
expression factor [Mus 
musculus] 



(US99 15) junctional adhesion 
molecule [Mus musculus] 



(AF029071) p52 pro-apototic 
protein [Gallus gallus] 



(M21305) unknown protein 
[Homo sapiens] 



(AB020725) KIAA091S protein 
Homo sapiens] 



5e-04 



3e-07 



(AF061S32) M4 protein 
deletion mutant [Homo sapiens] | le*0S 



(D88747) AR401 [Arabidopsis 
thaliana] I 8e-19 



7e-ll 



le- 15 



0.65 



!c- 19 



1JH 



WO 01/02568 



PCTAJS00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



chorionic 

gonadotropin beta I 
(CG beta 1) su bumt 
Homo sapiens 
microsatellite 
AF053066 [D5S2926 sequence 



1470 S 80935 



147 [ 



1472 U55177 



Danio rerio carbonic 
anhydrase homolog 
CAH-Z mRNA. 
[complete cds 



14731 AF064250 



Gallus sal (us 
jubiquitin specific 
>rotease 66 



1474 



[Homo sapiens 
pendrin (PDS) 
AF030880 mRNA, complete cds 



1475 AF 100694 



1476 X57398 



1477 ABO 10998 



1478 D 1087 1 



Mus muse ul us 
Pontin52 mRNA, 
complete cds 



Human- rriRNA for 
pM5 protein 



DESCRIPTION 



P VALUE 



5e-22 



2e-22 



2e-22 



2e-22 



2e-22 



6e-23 



3e-23 



Rattus norvesicus 
PAD-R1I mRNA for 
Peptidylarginine 
deiminase type I, 
complete cds 



Human h NAT allele 
2-2 sene for 

w 

arvlamine N- 
acetvltransferase 



1479 D10S71 



Human h NAT allele 
2-2 sene for 
arvlamine N- 
aceivltransferase 



Je-JJ 



2e-23 



2e-23 



115310 



728831 



3123190 



729367 



<NONE> 



107350 



<NONE> 



>gi|849l7|pir||A31S93 collagen 
alpha 1(IV) chain precursor - 
fruit fly (Drosophila 
melanogaster) melanogasterj 
>gi| 157078 (M96575) type IV 
[collagen pro-collagen 
[{Drosophila melanogaster ] 

!!!! ALU SUBFAMILY J 
[WARNING ENTRY 



CARBONIC ANHYDRASE 
(CARBONATE 

DEHYDRATASE) >gi|2576335 
(U55 177) CAH-Z [Danio rerio] 



(AF0I6107) ubiquitin specific 
2736064 [ protease 41 fGallus gallus] 

JKApKUltUN ^UUVV:\- 

IREGULATED IN ADENOMA) 
>gi|2135020|pir||A47456 down- 
regulated in adenoma (DRA) - 
human >gi|29l96-i (L027S5) 
Nuclear localization sisnal at 
AA 569-573, 576-580, 579-583 
|acidic transer. aciiv. domain 620 

RrObOX inui i l 6-"j-676 
[[Homo sapiens] 



0.027 



3e-04 



5e-I4 



7e-37 



<NONE> 



Pm5 protein - human 
>gi|1335273|gnl|PID|e36241 



<NONE> 



(J04734)CDC6 protein 
Saccharomvces cere vis iae 



4e-53 



<NONE> 



Ie-04 



<NONE> 



9.S 



171200 



(J04734)CDC6 protein 
[Saceharomvces ^erevisiae 



S.3 



X)5 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

IP 1 ACCESSION 



1430 



1481 



DESCRIPTION 



P VALUE fl ACCESSION 



1482 



1483 



1485 



1436 



AF 100694 



Mil 167 



Mus musculus 
Pontin52 mRNA, 
complete cds 



1487 AF100694 



14391 X03541 



Human 2SS 
ribosomal RNA gene. 
Mus musculus 
Pontin52 mRNA, 
complete cds 



Cloning vector 
pAP3neo DNA, 



14881 AB003468 complete sequence 2e-24 



Human mRNA of trk 
oncoeene > 
gb|I96l86|I96l86 
Sequence 23 from 
patent US 5734039 



7e-24 



2e-24 



2e-24 



1490 L81652 



I 49 I U95760 



1492 



AF 100694 



Homo sapiens 

(subclone 2_g 1 1 from 

PI H43) DNA 

sequence 
Drosophila 

melanogaster 

strawberry notch 

(sno) mRNA. 

complete cds 



2e-24 



Mus musculus 
Pontin52 mRNA, 
complete cds 



2e-24 



2e-24 



8e-25 





Homo sapiens MLL- 




AF024541 


AF4 fusion protein 
mRNA. partial cds 


2e-23 


L13773 


Human AF-4 mRNA, 
complete cds. 


■ - — ■ 

2e-23 


•AF 100694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


8e-24 


U75467 


Drosophila 

melanogaster Rga and 
Atu genes, complete 
cds 


! 

8e-24 


D 17076 


Human HepG2 partial 
cDNA, clone 
hmd5a09m5 


7e-24 



terns ) 



DESCRIPTION 



P VALUE 



serine/proline- rich FEL protein, 
2136142 [splice form 1 - human 



le-20 



(AF03I404) MLL-AF4 fusion 
3063962 protein [Homo sapiens] 



le-20 



<NONE> 



<NONE> 



<NONE> 



1658503 



(U75467) Atu [Drosophila 
[melanogaster] 



2eo7 



<NONE> 



m <NONE> 

■MftKAMID£-ft£LATflT 



<NONE> 



NEUROPEPTIDES 
PRECURSOR >ei|4 1 620S 
(U03137) neuropeptide 
precursor FMRFamide- related 
1 1 6964 ^ peptide [Lvmnaea stagnalis] 



7e-10 



3875481 



54917; 



(28 1054) predicted using 
|Genefinder; Similarity to UDP- 
gl ucoronos v Itrans ferases 



5.1 



USP1 PROTEIN PRECURSOR 
>si|I69623 



1.2 



987050 



(X65335) lacZ gene product 
[[unidentified cloning vector] 



0.058 



325465 



(M74509) [Human endogenous 
retrovirus type C oncovirus 
sequence.], gene product [Homo 
(sapiens | 



225047 



2078282 



262377. 



reverse transcriptase related 
protein [Homo sapiens] 



(U95760) Sno [Drosophila 
melanogaster] 



(AF004S35) tyrocidine 
synthetase 3 [Brevibacillus 
brevis) 



3e-04 



4e-12 



2e-41 



3.6 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor <BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



1493| AB002405 



I494j KQ30Q2 



1495 1 U61232 



1496| U 10245 



Homo sapiens mRNA 
for LAK-4p. 
complete cds 



Human mRNA from 
chromosome 15 sene 
with homology to 
MHC-HLA-SB-1 
intron A. 



1497 1 XS9211 



±mi — L 8 1653 - 



Human tubulin- 
folding cofacror E 
mRNA. complete cds 

Arabidopsis thaliana 
Col-0 putative RNA 
lelicase A mRNA, 
complete cds. 

H.sapiens DNA for 
endogenous retroviral 
like element 



8e-25 



8e-25 



7e-25 



5e-25 



Homo sapiens 
(subclone 2_gl 1 from 
PI H43) DNA 



14991 XS2S95 



15001 M36654 



150l| L36315 



1502| AB01828I 



15031 AF017433 



H.sapiens mRNA for 
DLG2 



Mouse homeo box 
2.6(Ho,x-2.6) rnRNA, 
complete cds. 



jc-25 



2e-25 



Mus musculus (clone 
piVfL2-l) zinc finger 
jrotein 

Homo sapiens mRNA 
for KIAA0733 
protein, complete cds 



9e-26 



9e-26 



-iomo sapiens 
putative transcription 
actor CR53 



9e-26 



9e-26 



Nearest Neighbor (BlastX vs. Non-Redundam Pmr7 



ACCESSION 



ins) 



DESCRIPTION 



P VALUE 



2496822 



1353239 



3323169 



1806134 



728S3I 



HYPOTHETICAL 127.3 KD 
PROTEIN B0416.1 IN 
CHROMOSOME X >gi|746502 
(U23516) B0416.1 gene product 
[[Caenorhabditis elegansl 



[(X92842) nuclear protein [Mus 
1514614 [musculusl 



9e-ll 



le-13 



(U61232) cofactor E [Homo 
1465772 (sapiens 1 | 2 e-05 



(U10245) putative RNA 
helicase A [Arabidopsis 
I thalianaj 



le-37 



(Y 127 13) Pro-Pol-dUTPase 
2065210 iDolvprotein 



5e-06 



((U9356S) putative p 150 [Homo 
tapiens) | 5e-i6 



[ MAG UK P55 SUBFAMILY 



MEMBER 2 (MPP2 PROTEIN) 

(DISCS, LARGE HOMOLOG 
24975 1 1 P) 



(AE001255)T. pallidum 
predicted coding region TPQ854 



(Z67747) zinc Finger protein 
|[Mus musculusl 



.'!!! .ALU SUBFAMILY J 
[WARNING ENTRY 



ZINC FINGER PROTEIN ZFP- 
32199S5 29 



Ie-34 



1.9 



4e-05 



le-07 



\ 



Ic-17 



Jjl 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



1 Nearest Neighbor (BlasiN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlascX vs. No n- Redundant Proteins) 



ACCESSION 



Homo sapiens 



1504 



(subclone 2_e6 from 
BAC H94) DNA 
AC00I225 sequence 



8e-26 



Mus m use u J us 
Pontin52 mRNA, 
1505| AF 100694 [complete cds 



DESCRIPTION 



P VALUE 



2653713 



(U91823) small S protein 
(Hepatitis B virus] 



8e-26 



15061 X94912 Ksapiens Pr22 gene | 3e-26 



Mus musculus 
Pontin52 mRNA, 
15071 AFI00694 complete cds 



2e-26 



1508 



1509 



Human small GTP 
binding protein Rab9 
U44103 mRNA, complete cds 



283446 



cyteme-rich surface antigen 72, 
CRP72 - Giardia lamblia 
l( fragment) 



728837 



"!! ALU SUBFAMILY SQ 
[WARNING ENTRY 



<NONE> 



<NONE> 



le-26 



3327038 



AF 1 00694 



Mus musculus 
Pontin52 mRNA, 
[complete cds 
IHomo sapiens 



9e-27 



1510 



genomic DNA, 2iq 
region, clone: 
AG001212 9HI1N46 



9e-27 



126296 



Mus musculus mucin 
glycoprotein MUC3 
15111 AF027131 I mRNA, partial cds 



1512 



U49057 



1513 



J03764 



1514 27S160 



1515 



Rattus norvegicus 
CTD-bindins SR-like 
protein rA9 mRNA. 
[ complete cds 

Human, plasminogen 
activator inhibitor- 1 
gene, exons 2 to 9. 



9e-27 



2589172 



M. musculus partial 
cochlear mRNA 
(clone 2SD2) 



5e-27 



3e-27 



1438534 



<N©NE> 



3e-27 



1490362 



Z64210 



H.sapiens CpG DNA. 
clone 99b4. reverse 
read cpg99p4.rtla . 



3e-27 



225753S 



(AB0I45I2) KIAA0612 protein 

[[Homo sapiens] 

(ACU0D9yu; Contains repeated 
reuion with similarity to 
gb|U43627 e.xtensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and ^b|2l878S 
come from this gene. 
4056454 [Arabidopsis thaliana] 



4.3 



3.4 



4e-09 



<NONE> 



8.7 



LINE- 1 REVERSE 
TRANSCRIPTASE 
|HOMOLOG protein 
fNycticebus coucans] 



(U76551) mucin Muc3 [Rattus 
norvegicus | 



0.14 



0.012 



2e-l4 



(U49057) rA9 [Rattus 
norvegicus] 



<NONE> 



(Z7S160) unknown [Mus 
musculus | 



(AB00453S) LIPOIC ACID 
SYNTHETASE 
PRECURSOR(LIP-SYN) 
[Si'hizosaccharomyces pombe] 



le-04 



<NONE> 



2e-05 



le-06 



WO 01/02568 



PCT/US00/18374 





Nearest 


Neiehbor f BlastN vs. Genbank) 


Nearest Neighbor ( BlastX vs. Non-Redundant Proteins) 


SEQ 
1 ID 




DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Homo sapiens 










1516 


L35659 


(subclone H8 6_h6 
from PI 35 H5 CS) 
DNA sequence. 


le-27 


<NONE> 


<NONE> 


<NONE> 


1517 


AF 100694 


Mils musculus 
Pomin52 rnRNA, 
complete cds 


le-27 


1644471 


(U72686) odorant receptor 4 
[Danio rerio] 


7,5 


151S 


AF 100694 


Mus musculus 
Pontin52 rnRNA, 
complete cds 


le-27 


2738388 


(AFO03534) hypothetical 
protein 004L [Chilo iridescent 
virusl 


6.7 


1519 


AB009271 


Homo sapiens gene 
for BCNT. partial cds 


le-27 


3880909 


(AL032636) Y40B1B.3 
[Caenorhabditis eleaans) 


4.6 


1520 


AF 100694 


Mus musculus 
Pontin52 rnRNA, 
complete cds 


le-27 


2133579 


sperm atop ho rin Sp23 - yellow 
mealworm molitor] 


0.85 


1521 


AF 1 00694 


Mus musculus 
Pomin52 rnRNA, 
complete cds 


le-27 


121805 


ENDOGLUCANASE A 
PRECURSOR 


0.58 


1522 


AF100694 


Mus musculus 
Pontin52 rnRNA. 
comDlete cds 


le-27 


3722000 


(AF035323) survival motor 
neuron protein [Bos taurus] 


0.10 


1 CI "* 

[ IjJj 


AF 100694 


Mus musculus 
Pontin52 rnRNA, 
complete cds 


le-27 


3328188 


(AF0749O2) laminin alpha chain 
Caenorhabditis elegansl 


0.083 




AFA741S? 


Homo sapiens IkB 
dnase eamma subunit 


le-27 


3641280 


(AF074382) IkB kinase gamma 
subunit [Homo sapiensl 


0.041 


1525 


AF 100694 


VIus musculus 
Pontin52 rnRNA. 
complete cds 


le-27 


4056454 


.{-w 'iji)^>0u) f~onrnins rrnnrprl 
region with similarity to 
gb|U43627 extensin (atE.xtl) 
gene from Arabidopsis thaliana. 
ESTs sb|Z34165 and gb|Zl8788 
come from this gene. 
[Arabidopsis thaliana] 


6e-04 


1526 


L7S778 


-lomo sapiens 
(subclone 2_el0 from 
PI H49) DNA 
sequence 


le-27 


225047 


reverse transcriptase related 
protein [Homo sapiens! 


2e-09 


1527 


L03427 


iuman zinc finger 
protein basonuclin 
rnRNA. complete cds. 


le-27 


I4S8275 


[U59694) zinc finger protein 
baionuclin [Homo sapiens] 


9e-22 


152S 


1 
i 

U09954 ( 


4uman ribosomal 
protein L9 gene, 5' 
region and complete 
:ds. 


4e-2S 


1 
1 

2257538 | 


[AB00453S) LIPOIC ACID 
SYNTHETASE 
PRECURSOR(LIP-SYN) 
Schizosaccharomyces pom be] 


2e-04 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor fBlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1529 



Z64210 



1530 



U55177 



1531 



D43682 



P VALUE 



H.sapiens CpG DNA, 
clone 99b4. reverse 
read cpg99b4.rtia 



Danio rerio carbonic 
anhydrase homolog 
CAH-Z mRNA, 
complete cds 



-luman mRNA for 
very-long-chain acyl- 
Co A dehydrogenase 
(VLCAD), complete 
cds 



1532 AFO 16591 



15 



AF 1 00694 



-lomo sapiens 
survival motor neuron 
pseudogene. complete 
sequence 



15341 AF100694 



1535 



AF100694 



1536 1 AF 100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Ponttn52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



4e-28 



4e-28 



4e-28 



3e-2S 



_ 



2e-28 



2e-28 



2e-2S 



2e-28 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins: 



ACCESSION 



3878570 



3123190 



1351839 



72883 1 



728832 



118588 



1 169643 



4056454 



DESCRIPTION 



(Z46J8 1 j similar to lipoic acid 
synthase; cDNA EST yk283b6.3 
comes from this gene; cDNA 
EST yk283b6.5 comes from this 
gene; cDNA EST yk472f5.3 
comes from this gene; cDNA 
EST yk472f5.5 comes from this 
gene; cDNA EST yk476e7.3... 



P VALUE 



7e-U 



CARBONIC ANHYDRASE 
(CARBONATE 
DEHYDRATASE) >gi|2576335 
KU55177) CAH-Z [Danio rerio] 



5e-2l 



ACYL-COA 

DEHYDROGENASE, VERY- 
LONG-CHAIN SPECIFIC 
PRECURSOR (VLCAD) 
|>ei[930358 taurus) 



3e-27 



rM ! .ALU SUBFAMILY J 
I WARNING ENTRY 



3e-08 



!!!! ALU SUBFAMILY SB 
[WARNING ENTRY 



2.5 



[DEHYDRIN DHN3 
>gi|i00035|pir||SlS139dehydrin 
DHN3 - garden pea >gi|20709 
j(X63063) pea dehydrin DHN3 
[[Pisum sativum) 



IFMRF AMIDE-RELATED 
NEUROPEPTIDES 
PRECURSOR >gi|4I6208 
(U03137) neuropeptide 
precursor FMRFamide-retated 

peptide [Lymnaea stagnalis] 
I (AC 005 990) Contains repeated 
reeion with similaritv to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34I65 and gbjZ18783 
come from this eene. 
[Arabidopsis thaliana] 



0.004 



6e-04 



9e-05 



Vt 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. No n- Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












(ALUIWWO) Contains repeated 




1537 


AF 100694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


2e-28 


4056454 


region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34l65 and gbiZ 18788 
come from this sene. 
(Arabidopsis thaliana] 


2e-06 


1538 


AF 100694 


Miis musculus 
Pontin52 mRNA, 
complete cds 


2e-28' 


4056454 


(AC005990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gbiZ34165 and gbiZ18788 
come from this eene. 
[Arabidopsis thaliana] 


2e-09 


1539 


AF 100694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


2e-28 


4056454 


(AC005990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34l65 and gbjZl8788 
come from this gene. 
[Arabidopsis thaliana] 


le-09 


. 154a 


- AHQQ694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


2e-28 


4056454 


(AC005990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gbjZ187S8 
come from this gene. 
[Arabidopsis thalianaj 


5e-10 


1541 


AF 100694 


VI us musculus 
Pontin52 mRNA, 
complete cds 


2e-28 


4056454 


region with similarity to 
gb|C43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gbjZl87SS 
come from this aene. 
[Arabidopsis thaliana] 


le-11 


1542 


AF 100694 


Vlus musculus 
Pontin52 mRNA, 
complete cds 


2e-28 


3157926 


(AC002 1 3 1) Strong similarity to 
extensin-like protein gb|Z34465 
from Zea mays. [Arabidopsis 
thaliana] 


Se-12 


1543 


AF100694 


Vlus musculus 
Pontin52 mRNA. 
complete cds 


lc-28 


<NONE> 


<NONE> 


<NONE> 


1544 


AF 100694 


Vlus musculus 
Pontin52 mRNA. 
complete cds 


Ie-2S 


<NONE> 


<NONE> 


<NONE> 


1545 


AF 100694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


le-2S 


<NONE> 


<NONE> 


<NOKE> 



WO 01/02568 



PCT/US00/18374 





Nearest Neiahbor { BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) I 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Mus musculus 


» 






1546 


AF 100694 


Pontin52 mRNA, 
complete cds 


le-2S 


<NONE> 


<NONE> 


<NONE> 


1547 


AF 100694 


Mus museums 
Pontin52 mRNA, 
complete cds 


lc-28 


<NONE> 


<NONE> 


<NONE> 


154S 


AF100694 


Mus muse ul us 
Pontin52 mRNA. 
complete cds 


lc-28 


<NONE> 


<NONE> 


• 

<NONE> 


1549 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE> 


1550 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE> 


1551 


AF 1 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE>| 


1552 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-^ 


<NONE> 


<NONE> 


<NONE> 


1553 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-^8 


<NONE> 


<NONE> 


<NONE> J 


1554 


AF 100694 


Mus musculus 
Pomin52 mRNA, 
complete cds 


le-*>8 


<NONF> 


<NONE> 


<NONE> 


1555 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE> 


M 


AF 100694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE>| 


1557 J 


AF 100694 


VIus musculus 
Pontin52 mRNA. 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE> 


1558 | 


AFl 00694 


Vlus musculus 
Pontin52 mRNA. 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE> 


1559 AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


<NONE> 


<NONE> 


<NONE> 


1560 AF100694 


Vlus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


<NONE> 


<NONE> 


\ 

<NONE> 


156l| AF 100694 


Vlus musculus 
Pontin52 mRNA, 
:omplete cds 


le-28 


<NONE> 


<NONE> 


<NONE> 



J3j 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlasiN vs. Genbank) 



ACCESSION 



1562 



1563 



1564 



1565 



1566 



1567 



1568 



1569 



DESCRIPTION 



P VALUE 



AF100694 



AF100694 



API 00694 



AF 100694 



M87708 



AF 100694 



AF 100694 



AF 100694 



ACCESSION 



DESCRIPTION 



Mus musculus 



P VALUE 



Pontin52 mRNA. 
complete cds 
Mus musculus 
Pontin52 mRNA, 
complete cds 
Mus musculus 
Pomin52 mRNA, 
complete cds 



le-28 



le-28 



le-28 



<NONE> 



<NONE> 



<NONE> 



Mus musculus 
Pontin52 mRNA, 
complete cds I le-28 

Human simple repeat 
polymorphism. | le-28 



. <NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



Mus musculus 
Pontin52 mRNA, 
complete cds 



le-28 



<NONE> 



VIus musculus 
Pontin52 mRNA, 
complete cds 



le-28 



3924779 



VIus musculus 
Pontin52 mRNA. 
omplete cds 



le-28 



116964; 



<NONE> 



<NONE> 



<NONE> 

B; cDNA EST yk450d8.5 comes 
from this gene; cDNA EST 
yk249a6.5 comes from this 
gene; cDNA EST yk2 19a2,5 
comes from this gene; cDNA 
EST yk355e4.5 comes from this 
gene; cDNA EST yk224f4.5 
comes fr... 

>gi|392488 l|gnljPID|el 354569 

yk249a6.5 comes from this 
gene; cDNA EST yk2l9a2.5 
comes from this sene; cDNA 
EST yk355e4.5 comes from this 
gene; cDNA EST yk224f4.5 

comes from... 

FMKFAMIDE-RELATED 

NEUROPEPTIDES 
PRECURSOR >gi|4 16203 
'U03137) neuropeptide 
precursor FMRFamide-related 
peptide [Lymnaea stagnalisl 



<NONE> 



<NONH> 



<NONE> 



<NONH> 



<NONE> 



<NONE> 



3.0 



0.66 



73b 
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Nearest Neighbor f BlastN vs. Genbank) 



SEQ 

ID I ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. No n- Redundant Proteins) 



P VALUE ACCESSION 



DESCRIPTION 



P VALUE 



1570 



1571 



Mus musculus 
Pontin52 mRNA, 
AF 100694 complete cds 



Mus musculus 
Pontin52 mRNA, 
AF10Q694 [complete cds 



Mus musculus 
Pontin52 mRNA, 
15721 AF100694 complete cds 



|Mus musculus 
Pontin52 mRNA, 
15731 AF10Q694 Icomplete cds 



1574 



1575 



Mus musculus 
Pontin52 mRNA, 
AF100694 complete cds 



1576 



1577 



Mus musculus 
Pontin52 mRNA, 
AF 100694 [complete cds 



AF 100694 



AF 100694 



Mus musculus 
[Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



1 573 AFI00694 



Mus musculus 
Pontin52 mRNA. 
complete cds 



— 



le-2S 



3924779 



le-28 



2133579 



le-28 



D. iDNA EST .vUjOlKU mines 
from this gene; cDNA EST 
yk249a6.5 comes from this 
gene; cDNA EST yk2l9a2.5 
comes from this gene; cDNA 
EST yk355e4.5 comes from this 
gene; cDNA EST yk224f4.5 
comes fr... 

>gi|3924S8 l|gnl|PED|el354569 
from this gene; cDNA EST 
yk249a6.5 comes from this 
gene; cDNA EST yk219a2.5 
comes from this gene; cDNA 
EST yk355e4.5 comes from this 
gene; cDNA EST yk224f4.5 
comes from... 



spermatophorin Sp23 - yellow 
mealworm molitor] 



2133579 



le-2S 



283446 



le-28 



2498937 



spermatophorin Sp23 - yellow 
mealworm molitor] 



0.65 



0.49 



0.49 



cyteine-rich surface antiaen 72, 
CRP72 - Giardia lamblia 
(fragment) 



le-28 



le-28 



1492050 



2133579 



le-28 



118588 



le-28 



L185SS 



SPERMATOPHORIN SP23 
PRECURSOR mealworm 
>gi|161725 (M92923) structural 
protein 



(U60315) MCI07L [Molluscum 
coniagiosum virus subtype 1] 



spermatophorin Sp23 - yellow 
mealworm molitor] 



DEHYDRJN DHN3 
>gi| 1 00035|pir||S 1 S 1 39 dehydrin 
DHN3 - garden pea >gi|20709 
(X63063) pea dehydrin DHN3 
'Pisum sativum] 



0.45 



0.33 



0.1S 



o.oss 



DEHYDRIN DHN3 
>gi|l00035|pir||SLSl39 dehydrin 
DHN3 - garden pea >gi|20709 

X63063) pea dehydrin DHN3 

Pisum sativum] 



0.0 IS 



0.016 



1 



n pi 
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SEQ 
ID 



Nearest Neighbor (BiastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1579 AF100694 



1580 AF100694 



15811 AFI00694 



15821 AF100694 



1583 AF 100694 



Mus muse u I us 
Pontin52 mRNA. 
complete cds 



Mus muse u I us 
Pomin52 mRNA, 
complete cds 



VI us muse ul us 
5 ontin52 mRNA, 
complete cds 



Vtus musculus 
Pontin52 mRNA. 
complete cds 



1584 AF100694 



15851 AF 1 00694 



15861 AF 1 00694 



ontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundam Proteins) 



ACCESSION 



DESCRIPTION 



le-28 



118588 



le-28 



4056454 



[DEHYDRIN DHN3 
>gi|100035ip iri |Sl8139 dehydrin 
DHN3 - garden pea >gi|20709 
(X63063) pea dehydrin DHN3 

[[Pisum sativum! 

>) Contains repeated 
[region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234l65 and gb|2 18788 
come from this sene. 
[Arabidopsis thaliana] 



P VALUE 



le-28 



118588 



DEHYDRIN DHN3 
>gi| 100035|pir||S 18139 dehydrin 
DHN3 - garden pea >gi|20709 
(X63063) pea dehydrin DHN3 
|[Pisum sativum] 



le-28 



1169643 



FMRfAiVlIbE-ftSLAfED 

NEUROPEPTIDES 
PRECURSOR >gi|4 16208 
(U03137) neuropeptide 
precursor FMRFamide -related 
[peptide [Lymnaea stagnalis] 



le-28 



4056454 



le-28 



118588 



le-2S 



4056454 



(ALU059yO) Contains" repeated 

region with similarity to 

gb|U43627 extensin (atExtl) 

gene from Arabidopsis thaliana. 

ESTs gb|Z34|65 and gbiZ 18738 

come from this gene 

[Arabidopsis thalian aj 
DEHYDRIN DHN3 

>gi|I00035|pir||S I S 139 dehydrin 

DHN3 - garden pea >gij20709 

(X63063) pea dehydrin DHN3 

[Pisum sativum] 

(AC 005 990) Contains repeate 

region with similarity to 

gb|U43627 extensin (atExtl) 

gene from Arabidopsis thaliana. 

ESTs gb|234l65 and gb|ZlS788 

come from this yene. 

[Arabidopsis thalianaj 



0.012 



0.010 



0.002 



0.002 



0.002 



0.002 



le-28 



118588 



I DEHYDRIN DHN3 
>gi| 1 00035ipir||S IS 139 dehydrin 
DHN3 - garden pea >gi|20709 
(X63063) pea dehydrin DHN3 
[[Pisum sativuml 



0.002 



0.001 



WO 01/02568 
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Nearest Neighbor ( BlastN vs. Genbank) [ 



accession! DESCRIPTION P VALUE 1 ACCESSION 



1587 AF100694 



1588 AF100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



1589 AF100694 



1590 AF 1 00694 



1591 AF100694 



1592 AF100694 



159; 



AF 100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA. 
complete cds 



VIus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



Ie-28 



le-28 



4056454 



4056454 



le-28 



ie-28 



4056454 



4056454 



le-28 



118588 



le-28 



4056454 



le-28 



4056454 



C. 



DESCRIPTION 
(ACG05990) Contains repeated 



P VALUE 



region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234165 and gb|2l8788 
come from this gene. 
[Arabidopsis thaliana] 



(AC0O599O) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from .Arabidopsis thaliana. 
ESTs gb|234165 and gb|Z 18788 
come from this gene. 
Arabidopsis thalianal 



0.001 



"ACOODyyO) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234165 and gb|Z 18788 
come from this gene. 
[Arabidopsis thaliana] 



6e-04 



5e-04 



|(AC005990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234165 and gb|2 18788 
come from this aene. 

I w 

[[Arabidopsis thaliana] 



5e-04 



DEHYDRIN DHN3 
>gi|l00035|pir||Sl8139 dehydrin| 
|dHN.3 - garden pea >gi|20709 
j(X63063) pea dehydrin DHN3 
[Pisum sativum] 



2e-04 



[(ALUUsyyO) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234165 and gb!21878S 
come from this sene. 

I w 

[[Arabidopsis thaliana) I 2e-04 

KAC005990) Contains repeated 

region with similarity to 

gbjU43627 extensin (atExtl) 

gene from .Arabidopsis thaliana. 

ESTs eb|Z34165 and gbiZ18788 

come from this sene. 

[[Arabidopsis thaliana] I 5e-05 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1594 1 AF100694 



Mus muse ul us 
Pontin52 mRNA. 
complete cds 



1595| AF100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



15961 AF 1 00694 



1597 Af 100 6 94 - 



Mus musculus 
Pontin52 mRNA. 
complete cds 



Mus musculus 
Pontin52 mRNA, 



1598 1 AF 100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



1599 AF100694 



1600 AF 100694 



Mus musculus 
Pontin52 mRNA. 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



le-28 



4056454 



le-28 



4056454 



le-28 



4056454 



Je-2S 



4056454 



le-28 



4056454 



le-28 



4056454 



le-2S 



544357 



DES CRIPTION 

Contains repeatec 



P VALUE 



region with similajity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTsgb|Z34165 and gb|Zl8788 
come from this gene. 

[Arabid 0 ps| s thai i anal 

(ALUO^yyO) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb(Z34I65 and gb|Z 18788 
come from this gene. 
(Arabidopsis thaliana] 



5e-05 



(AC0O5990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana 
ESTs gb|234165 and gb|Z!8788 
come from this gene. 

Arabidopsis tha liana] 

(ACOOoyyU) Contains repeate 
region with similaritv to 
gb|U43627 extensin (atExtl) 
gene from .Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Zl8788 
come from this gene. 
(Arabidopsis thaliana] 
(ACOuoyyO* Contains repeatei 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb[Z34!65 and gb|Zl87SS 
come from this gene. 

Arabidopsis thaliana] 

0*)yU) Contains repeate 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z 18788 
come from this gene. 
[Arabidopsis thaliana] 



le-05 



le-05 



9e-06 



6e-06 



RNA-BINDING PROTEIN 
FUS/TLS protein [human. 
Peptide, 526 aa] [Homo sapiens] 



5e-06 



4e-06 



WO 01/02568 



PCT/US00/18374 



' ' || Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


seq j 

ID J ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 











(AC0OD99O) Contains repeated 




1601 


AF I 00694 


Mus musculus 
Pontin52 mRNA. 
complete cds 


le-28 


4056454 


region with similarity to 
gb|U43627 extensin (adExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234l65 and gb|Z 18788 
come from this gene. 
[Arabidopsis thaliana] 


2e-06 J 


1602 


API 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


Le-28. 


4056454 


(AC005990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34 165 and gb|Z 18788 
come from this aene- 
[Arabidopsis thaliana) 


2e-06 


1603 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


4056454 


(ACQ05990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTsgb|Z34165 and gb|Z 18788 
come from this gene. 
[Arabidopsis thaliana] 


9e-07 J 


1604 


AF 1 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


4056454 


(ACQ0599G) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExt 1) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z1878S 
come from this gene. 
[.Arabidopsis thaliana) 


8e-07 


1605 


AFI 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


1169643 


FMRFAMIDE-RELA TED 
NEUROPEPTIDES 
PRECURSOR >gi|4l620S 
(U03137) neuropeptide 
precursor FMRFamide- related 
peptide [Lymnaea stagnalisl 


7e-07 


1606 


* 

AF 100694 


Vlus musculus 
Pontin52 mRNA, 
complete cds 


le-2S 


4056454 


(AC00599U) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34l65 and gbjZl87S8 
come from this gene. 
Arabidopsis thalianal 


6e-0" 


1607 


AFI 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


4056454 


(ACOOo990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs 2blZ34165 and ebtZ187SS 
come from this gene. 
Arabidopsis thaliana] 


\ 

5e-0~ 



e a 



WO 01/02568 
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***** '*"* 


Nearest Neighbor iBlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












(ACOODyyU) Contains repeated 




1603 


AF 100694 


Mus muse ul us 
Pontin52 rnRN'A, 
complete cds 


le-28 


4056454 


region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z 18788 
come from this gene. 
[Arabidopsis thaliana] 


3e-07 


1609 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-2* 


4056454 


(AC00D99O) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34l65 and gb|Z!8788 
come from this gene. 
[Arabidopsis thaliana] 


. le-07 


1610 


AF 1 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


Le-2S 


4056454 


(AC0O599O) Contains repeated 
resion with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34l65 and gb|Z 18788 
come from this gene. 
[Arabidopsis thaliana] 


le-07 


L611 
1612 


AF 100694 
AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 

Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 
le-28 


4056454 
4056454 


(AC005990) Contains repeated 
resion with similarity to 
°b|U43627 extensin CatExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34 1 65 and gb|Z 1 8788 
come from this gene. 
[Arabidopsis thaliana] 


7e-0S 


(ACUlbyyO) Contains repeated 
'region witn siniilui ityto 




gb|U43627 extensin (atExtl ) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34 165 and sb|Zl8788 
come from this gene. 
[Arabidopsis thaliana] 


2e-08 


1613 


AF 1 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


4056454 


(ACUO^yyu) Contains repeated 
region with similarity io 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34 i 65 and gbjZlS788 
come from this sene. 
[Arabidopsis thaliana] 


6e-09 




WO 01/02568 



PCT/US00/18374 



& 



( Nearest Neighbor (BlastN vs. Genbank) 0 Nearest Neishbor (BlastX vs. Non- Redundant Proteins) 


SEQ 

m 


ACCESSION 


DESCRIPTION 


II 1 

U i 
P VALUE | ACCESSION 


DESCRIPTION 


P VALUE 










(ACOO^yyOj Contains repeated 




1614 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 


4056454 


region with similarity to 
gb|U4j627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234l65 and gb|Z!8788 
come from this aene. 
[Arabidopsis thaliana] 


5e-09 


1615 


AF 100694 


Mus musculus 
Pontin52 rnRNA, 
complete cds 


le-28 


4056454 


{AC0UD99O) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from .Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z!8788 
come from this sene. 
[Arabidopsis thaliana] 


4e-09 


1616 


|_ AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 1 


4056454 


(AC005990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from .Arabidopsis thaliana, 
ESTs gb|Z34165 and gb|Z 18788 
come from this aene. 
[Arabidopsis thaliana] 


7c- 10 


1617 


AFI00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 | 4056454 


(ACQ05990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z187S8 
come from this gene. 
[Arabidopsis thaliana] 


6e-l0 


1613 


AF100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-28 | 


4056454 


(AC00599U) Contains repeated 
region with similarity to 
gb|u4j627 extensin (atExtl) 
gene from .Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z 18788 
come from this gene. 
[Arabidopsis thalianal 


5e-l0 


1619 


AF 100694 


VIus musculus 
Pontin52 mRNA, 
complete cds 


1 

le-28 | 4056454 


(AC005990) Contains repeated 
resion with similaritv to 
gb|L*-3627 extensin (atExtl) 
gene from .Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Zl87S8 
come from this gene. 
'Arabidopsis thaliana] 


4e-I0 
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SEQ 
ID 



Nearest Neighbor CBIastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1620 AF100694 



1621 AF 100694 



16221 AF 100694 



1623| AF032896 



P VALUE 



Nearest Neighbor (B las tX vs. Non-Redundant Proteins; 



ACCESSION 



Mus muse u I us 
Pontin52 mRNA, 
complete cds 



le-28 



4056454 



Mus musculus 
Pontin52 mRNA, 
complete cds 



le-2S 



4056454 



DESCRIPTION 
(ACUtb9SW) Lontains repeate 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z 18788 
come from this gene. 

I Arabidopsis thai i anal 

(ALQ059yu) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234I65 and gb|2 18788 
come from this gene. 
{Arabidops is thaliana] 



P VALUE 



Mus musculus 
Pontin52 mRNA, 
c omplete cds 
etromyzon marinus 
jolyadenylate binding! 



protein 



le-28 



le-28 



4056454 



1082703 



1624 1 AFI00694 



1625 API 00694 



16261 AF100694 



16271 AF 1 00694 



ontains repeated 
region with similarity to 
gb|L : 43627 extensin (atExtl) 
gene from .Arabidopsis thaliana. 

[ESTs gb|Z34l65 and gblZlS788 

Icome from this sene. 

1 v 

[Arabidopsis thaliana] 

polyadenylate binding protein II 
human 



2e-10 



5e-Ll 



Mus musculus 

complete cds 
Mus musculus 
Pontin52 mRNA, 
complete cds 



9e-29 



9e-29 



11S58S 



2133579 



Mus musculus 

ontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



9e-29 



3876465 



DEHYDRIN DKN3 
>gi|100035|piri|Sl8139 dehydrin 
DHN3 - garden pea >gi|20709 
X63063)pea de hvdrin DHN3 
[Pisum sativum] 

spermatophorin Sp23 - yellow 
mealworm molitor] 

(281071) predicted using 
Genefinder; Similarity to 
Human small nuclear 
ribonucleo protein E cDlNA EST 
yk375g7.5 comes from this 
gene; cDNA EST yk435r*5.3 
comes from this sen... 



8e-29 



4056454 



Lontains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from .Arabidopsis thaliana. 
ESTs eb|Z34l65 and gb;ZlS7SS 
come from this gene. 
{Arabidopsis thaliana] 



2e-I2 



2e-27 



6e-04 



9e-06 



2e-06 
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SEQ 
ID 



Nearest Neighbor (BlastN vs. Gen bank) 



ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) j 



P VALUE 



ACCESSION 



DESCRIPTION 



P VALUE 



1629 



1630 



1631 



1633 



1634 



1635 



1628 AF 1 00694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pomin52 mRNA, 
AF 100694 |completecds 



Mus musculus 
Pontin52 mRNA, 
AFIO0694 [complete cds 



4e-29 



4e-29 



Human mRNA for 
very-long-chain acyl 
CoA dehvdroeenase 

I w- 

(VLCAD). complete 
D43682 cds 



4e-29 



M.tuberculosis accBC 
1632| Y07660 Igene 



Human alpha-satellite 
IDNA from clone 
X55367 |pTRA-2. 



Homo sapiens 
(subclone l_fl from 
PI H54) DNA 
L81866 jsequence 



{ Alu repeats, clone 
I52H10} [human, 
colonic mucosa. 
S75940 Genomic. 943 ntl 



4e-29 



4e-29 



le-29 



le-29 



le-29 



72888: 



544357 



4056454 



II682S7 



2113935 



<NONE> 



<NONE> 



72883 1 



AfiP-ki&OiiVLAtiON 



FACTOR 3 fruit fly (Drosophila 
melanogaster) >gi|507234 
(L25063) ADP ribosylation 
factor 3 [Drosophila 
melanoeasterl 



RNA-BINDING PROTEIN 

FUS/TLS protein [human. 

Peptide 526 aa] [Homo sapiens^ 
(ALUU^yyU) Contains repeated 

region with similarity to 

gb|U43627 extensin (arJE.xtl) 

gene from Arabidopsis thaliana. 

ESTs gb|Z34l65 and gb|Z 18788 

come from this gene. 

[Arabidopsis thaliana] 



ACVL-COA 
DEHYDROGENASE, VERY- 
LONG-CHAIN SPECIFIC 
PRECURSOR (VLCAD) 
dehydrogenase precursor - rat 
Acyl -CoA dehydrogenase 
Rattus norvesicus] 



(295556) accD I 
Mycobacterium tuberculosis] 



0.016 



2e-07 



le-08 



6e-37 



3e-47 



<NONE> 



<NONE> 



!!!! .ALU SUBFAMILY J 
W.-VRNING ENTRY 



<NONE> 



<NONE> 



le-07 



1636 



Homo sapiens 
AB001907 PACE4eene. e.xon 13 



le-29 



728831 



!!!! .ALU SUBFAMILY J 
WARNING ENTRY 



2e-09 



1637 



Mus musculus SH3 
Idomain-containing 
adapter protein 
AF077003 [mRNA. complete cds 



5e-30 



<NONE> 



<NONE> 



<NO.VE> 
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l * m 


Nearest Neishbor (BlastN vs. Genbank) 


Nearest Neiehbor iBIastX vs. Non- Redundant Proteins) 


SEQ 
ED 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












(AC0059yU) Contains repeated 




1638 


AF 100694 


Mus muse ul us 
Pontin52 mRNA, 
complete cds 


4e-30 


4056454 


region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis th a liana. 
ESTs eb|Z34I65 and gb|Zl8788 
come from this gene. 
[Arabidopsis thaliana] 


3e-l0 


1639 


M27072 


Xenopus laevis 
poly(A)-binding 
protein (ABP-EF; 
mRNA. complete cds. 


4e-30 


' 1352709 


POL Y ADENYLATE - 
BINDING PROTEIN 
polyadenylate-binding protein - 
African clawed frog laevis] 


5e-21 


1640 


X5S386 


B.taurus mRNA for 
bovine vacuolar 
ATPase sub un it A 


2e-30 


2773154 


(AF039573) abscisic acid- and 
stress-inducible protein 


4,3 


1641 


Y07660 


M. tuberculosis accBC 
gene 


le-30 


2113935 


(Z95556) accD 1 
[Mycobacterium tuberculosis] 


4e-47 


1642 


AJ236940 


Sus scrofa mRNA for 
hypothetical protein 
(5': clone 7C4) 


4e-3l 


4102021 


(AF007561) delta 6-desaturase 
[Borago officinalis] 


7.4 


1643 


AF039400 


Homo sapiens 
calcium-dependent 
chloride channel- 1 
(hCLCAl) mRNA, 
complete cds 


2e-3l 


3721912 


(AB017156) gob-5 [Mus 
musculus] 


7c -OS 


L644 


L77036 


(subclones d9 from 
PI H19) DNA 
sequence. 


le-3l 


461663 


BOMBYXIN B-2 HOMOLOG 

PRECURSOR silkmoth 

>ai|2 iv j83|gnl|PlD|dl003528 — 

w 1 I" 1 1 

(D 13924) Samia bombyxin 
homolos B-2 [Samia cynthia) 


11 


1645 


X61971 


H.sapiens mRNA for 
macropain subunk 
delta 


Ie-3i 


296734 


(X61971) macropain subunit 
delta [Homo sapiens] 


3e-06 


1646 


L00016 


human mitochondrial 
trnas and partial 
proteins 4 & 5; 
histidyi-. seryl-, 
leucyl-tma genes; 
urf4 and urf5 
(partial). 


5e-32 


4056454 


(AC00:>yy0) Contains repeated 
region with similarity to 
gb|U43627 extensin (acExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34 165 and gb|ZlS7SS 
come from this gene. 
[Arabidopsis thaliana] 


0.002 


1647 


M17S87 


Human acidic 
ribosomal 
phosphoprotein P2 
mRNA. complete cds. 


5e'-32 


4056454 


(ACOOoyyO) Contains repeated 
res ion with similaritv to 
sb|U43627 extensin (atExt I) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34 1 65 and gb|Z 13788 
come from this eene. 
(Arabidopsis thaliana] 


le-05 
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Nearest Neighbor (BlastN vs. Genbank) 


L Nearest Neighbor fBlastX vs. Non-Redundant Proteins) | 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


| ACCESSION 


DESCRIPTION 


P VALUE 






tinman rnicogen- 










1659 


U53446 


responsive 

phosphoprotein DOC- 
2 mRNA, complete 
cds. 


6e-34 


3395443 


(AC004683) putative 
ammonium transporter. 3' partial 


4.7 


1660 


AFQ13988 


Homo sapiens serine 
protease mRNA, 
complete cds 




4ju f — — O 


PROTEIN-TYROSINE 
PHOSPHATASE EPSELON 
PRECURSOR (R-PTP- 
EPSILON) >gi| 1439605 
(U62387) protein tyrosine 
pnospnaitise-e [iviuo itiu^luiusj 




1661 


U53446 


Human mitogen- 
responsive 

pnospnoprotein uuu- 
2 mRNA, complete 
cds. 


2e-34 


104757 


LEP100 protein precursor - 
chicken >2i|2 12254 aaliusl 


1.6 


1662 


AJ233632 


Homo sapiens 
endogenous retroviral 

w 

sequence ERV-L pol 
gene, clone ERV-L 

¥ T £. 

Humano 


Je-34 


joOUj I J 


(AJ233597) reverse 
transcriptase [ivius [amuiusj 




1663 


AF086310 


Homo sapiens full 
length insert cDNA 
clone ZD51F08 


8e-35 


2947070 


(AC002521) putative Ser/Thr 
protein kinase [Arabidopsis 
thaliana] 


2.3 


1664 


A. 1 / SVO 


Human mRNA for 

Li_rvcp J 




73065"> 


40S RIBOSOMAL PRO 1 bk\ 
S2 (STRINGS OF PEARLS 
PROTEIN) 

>oi| 1085 153tairllS^0325 
ribosomal protein S2 - fruit fly 
(Drosophila melanogaster) 
melanogaster) >gi|5 15972 
Cti01'535 , l ribosomal orotein S2 


2e-10 


1665 


AB011137 


Horno sapiens mRNA 
for KIAA0565 
protein, complete cds 


3e-35 


3043654 


(ABO 1 1 137) KIAA0565 protein 
[Homo sapiens 1 


2e-l6 


1666 


U62801 


Human protease M 
mRNA. complete cds 


2e-35 


3929231 


(AF091247) potassium channel 
[Rattus norveeicus] 


1.0 


1667 


AF020760 


Homo sapiens serine 
protease (Omi) 
mRNA. complete cds 


le-35 


2738915 


(AF020760) serine protease 
[Home sapiensl 


9e-U J 
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Nearest Neiehbor f BlastN vs. Genbank) 


. Nearest Neiehto 


or (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Human DNA 










lOOo 


293943 


sequence from 
cosmid U2^DHj on 
chromosome X 


8e-36 




(M22333) unknown protein 
[Homo sapiens! 


3e-l0 


1669 


X06778 


Rabbit 18S rRNA 


7e-3o 


118588 


DEHYDRJN DHN3 
>gi|l00035|pir||S 18139 dehydrin 
DHN3 - garden pea >gi|20709 
(X63063) pea dehydrin DHN3 
[Pisum sativum] 


0.01 1 


1670 


AB007962 


Homo sapiens 
mRiNA. chromosome 
1 specific transcript 
KIAA0493 


3e-36 


3329243 


(AE001 350) hypothetical 
protein [Chlamydia trachomatis! 


3.1 


I CI I 


Z81014 


Human uina 
sequence from 
cosmid U65A4, 
between markers 
DXS^66 and DXS8 / 
on chromosome X * 


3e-36 


14! IUJ 


HYPOTHETICAL PROTEIN 
ORF-1137 mouse 


0.038 


1672 


23 1014 


Human DNA 
sequence rrom 
cosmid U65A4, 
between markers 
DXS366 and DXSS7 
on chromosome X * 


3e-j6 


19865 1 


(M29325) ORFl [Mus 
muse ul us] 


0.006 


* 

1673 


U49082 


Human transporter 
protein (si /) mKNrt. 
complete cds 


3e-36 


1840045 


(1J490S2) transporter protein 
[Homo sapiens] 


2e-L5 


1674 


J03133 


Human transcription 
factor SP1 mRNA. j 
end. 


3e-36 


All 1 n 

4/ / 133 


HF-1 regulatory element binding 
protein - rat 


2e-31 


) fn<\ 

1 D / J 


AB007934 


Homo sapiens mRNA 
for KIAA046D 
protein, partial cds 


lc-36 




(Ax5UU/Vj4) protein 
[Homo sapiens] 


4e-?7 


1676 


M34857 


Mouse Hox-2.5 
mRNA. 


9e-37 


106296 


homeotic protein Hox B9 - 
human (fragment) 


0.15 


1677 


L35657 


Homo sapiens 
(subclone HS 5_al0 
from PI 35 H5 CS) 
DNA sequence. 


9e-37 


2072960 


(U93568) p40 [Homo sapiensl 


3e-05 


167S 


XS0240 


H.sapiens 
endogenous 
reirovirus HER V- 
KC4 DNA 


Se-37 


4185944 


(YI7S33) env protein [Human 
endogenous retrovirus K] 


le-!5 



^ * 
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Nearest Neighbor (BlastN vs. Genbank) 



Nearest Neishbor (BlastX vs. Non- Redundant Proteins) 



seq| 

ID | ACCESSION 


DESCRIPTION 


P VALUE 


i 

ACCESSION 


DESCRIPTION 


P VALUE 






Human DNA 










1679 


293943 


sequence from 
cosmid IP35H3 on 
chromosome X 


9e-38 


106322 


hvDOthetical Drotein (L IH 3' 
resion) - human 


4e-13 


1680 


X97303 


H.sapiens mRNA for 
Pte- 1 2 protein 


4e-38 


1 * 

1 t 

\ 466044 


TffPUI Hh I rrj-AL Z1NL 
FINGER PROTEIN ZK686.4 
IN CHROMOSOME III 
>gi|630780|pir||S44909 ZK686.4 
protein - Caenorhabditis elegans 
>gil304346 (L 17337) coded for 
by C. elegans cDNAs 
GenBank:M88869 and T01933; 
putative [Caenorhabditis 
elegans | 


3e-37 


1681 


Y08999 


H.sapiens mRNA for 
Sop2p-like protein 


3e-38 3334339 


S0P2-LIKE PROTEIN 


5e-06 


1682 


262887 


H.sapiens CpG DNA, 
clone 74g6, forward 


2e-38 I 1245686 


(U53181) F36D4.2 gene 
product [Caenorhabditis 


0 19 


1683 


U35032 


Human enaogenous 
retrovirus clone 
c5.11,HERV-H 
multiDlv SDliced 
subgenomic leader, 
protease and integrase 
region mRNA, partial 
cds 


le-38 


59977 


(Z14310) tripartite fusion 
transcript PL. A2L [Human 
endogenous retrovirus] 


le-06 


1684 


D86974 


Human mRNA for 
partial cds 


lc-38 


3337386 


(AC002544) Unknown gene 
nroduct so lice form -2 fHomo 
sapiensl 


Se-1 1 


1685 


M31013 


Human nonmuscle 
myosin heavy chain 
(NMHC) mRNA. 3' 
end 


Le-3S 


4115748 


(AB022023) nonmuscle myosin 
heaw chain B 


2e-ll 
— — — 


1686 


AF006087 


Homo sapiens Arp2/3 
protein complex 
subunit p20-Arc 
(ARC20) mRNA. 
complete cds 


4e-39 


<NONE> 


<NONE> 


<NONE> 


1687 


X58374 


D.melanoeaster crn 
mRNA 


4e-39 


2655888 


(AL009171) 62D9.a 
[Drosophila melanoeaster] 


4e-42 


1688 


D85815 


Human DNA for 
rhoHPL complete cds 


le-39 I 


134080 


GTP-BINDING PROTEIN 
TC10 ras-like protein [Homo 
sapiens} 


3e-26 



rQ 
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Nearest Neighbor fBlasiN vs. Genbank) 


Nearest Neiahbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1689 


U49057 


Rattus norvegicus 
CTD-binding SR-iike 
protein rA9 mRNA, 
complete cds 


4e-40 


1438534 


(U49057) rA9 [Rattus 
norvegicus] 


5e-05 


1690 


Y08999 


H. sapiens mRNA for 
Sop2p-Iike protein 


4e-40 


3334339 


SOP2-LIKE PROTEIN 


9e-08 


1691 


AB002293 


Human mRNA for 
K1AA0295 gene, 
partial cds 


4e-40 


2224531 


(AB 002293) KIAA0295 [Homo 
Sapiens] 


le-30 


1692 


AF086222 


Homo sapiens full 
length insert cDNA 
clone ZC66E08 


* 

le-40 


2829669 


DOUBLE-STRANDED RNA- 
SPECEFIC EDITASE 1 
(DSRNA ADENOSINE 
DEAxMINASE) (RNA 
EDITING ENZYME 1) 
>gi| 1 707502 |gnl|PID|e254627 
(X99227) double-stranded RNA- 
specific editase [Homo sapiens] 
editase 1 hREDl-L [Homo 
sapiens] >gi|2039300 (U7642 1) 
dsRNA adenosine deaminase 
DRADA2b [Homo sapiens] 


0.61 


1693 


AF044127 


Homo sapiens 
peroxisomal short- 

rhninnlrQh"' 

dehydrogenase 
(SCAD-SRL) mRNA, 
complete cds 


le-40 


- 

4105190 


(AF044127) peroxisomal short- 
chain alcohol dehvdrogenase 


2e-06 


1694 


U36778 


Vlus musculus Sil 
mRNA. complete cds 


le-40 


88608 


SIL protein - human >gi|33S08S 
(M74558) SIL 


6e-23 


1695 


U36778 


Vlus musculus Sil 
mRNA. complete cds 


le-40 


88608 


SIL protein - human >gi|3380SS 
(M74558) SIL 


6e-23 


1696 


U36778 


Mus musculus Sil 
mRNA. complete cds 


le-40 


88608 


SrL protein - human >gi|3380SS 
(M74558) SIL 


5e-23 


1697 


U36773 


Vlus musculus Sil 
mRNA. complete cds 


le-40 


88608 


SIL protein - human >gi|3380SS 
(M1455Z) SIL 


5e-23 


1693 


AB018285 


-fomo sapiens mRNA 
for KIAA0742 
protein, partial cds 


le-40 


3832205 


(A301S2S5) KIAA0742 protein 
Homo sapiens! 


6e-31 



297 
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Nearest Neighbor (BlastN vs. Genbank) I Nearest Neighbor (BlastX vs. No n- Redundant Proteins) 



SEQ 
[D 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION . 


DESCRIPTION 


P VALUE 












ATP-BINDING CASSETTE 




1699 


X75927 


M.musculus abc2 
mRNA 


le-40 


728773 


TRANSPORTER I ABC1 - 
human >gi|495257 (X75926) 
abcl [Mus musculus] 


3e-37 


1700 


AF038200 


Homo sapiens clone 
23954 mRNA 
sequence 


5e-41 


3211975 


(AF068195) putative 
glialblastoma cell differentiation 
related protein [Homo sapiens] 


5e-l4 


1701 


U2052 1 


Human estrogen 
sulfotransferase 
(STE) gene, exon 3 
and complete cds 


4e-41 


• <NONE> 


<NONE> 


<NONE> 


1702 


AF026548 


Homo sapiens 
branched chain alpha- 
ketoacid 

dehydrogenase kinase 
precursor, mRNA, 
nuclear gene 
encoding 
mitochondrial 
protein, complete cds 


2e-41 


3182923 


[3-METHYL-2- 
OXOBUTANOATE 
DEHYDROGENASE 
(LIPO AMIDE)] KINASE 
PRECURSOR alpha-ketoacid 
dehydrogenase kinase precursor 
[Homo sapiens] 


2C-09 


1703 


Y07660 


M. tuberculosis accBC 
gene 


2e-4 1 


465847 


HYPOTHETICAL 66.5 KB 
PROTEIN F02A9.5 IN 
CHROMOSOME III 
>gi|2S0542|pir||S283l3 
hypothetical protein F02A9.5 - 
Caenorhabditis elegans 
Gene finder; similar to Propionyl- 
CoA carboxylase beta chain; 
cDNA EST EMBL:M89018 
comes from this sene; cDNA 
EST EMBL:D28069 comes 
from this sene; cDNA EST 
EMBL:D2S068 comes from this 
gene; cDNA EST ... 


3e : 8 


1704 


AG001237 


Homo sapiens 
genomic DNA, 21q 
region, clone: 

w 

9H11N46 


le-41 


106322 


hypothetical protein (L1H 3' 
reaion) - human 


5e-09 


1705 


AB007934 


Homo sapiens mRNA 
for KIAA0465 
protein, partial cds 


le-41 J 3413892 


(AB007934) KIAA0465 protein 
[Homo sapiens] 


3e-l2 


1706 


AF055029 


Homo sapiens clone 
24711 mRNA 
sequence 


5e-42 


3250681 


(AL0244S6"» putative protein 





298 
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Nearest Neiehbor fBlastN vs. Genbank) 


Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 


ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






* 






1- 




1707 


249747 


O.cuniculus mRNA 
for phospho lipase C 


5e-42 


130227 


PHOSPHA 1 ID YLINOS1TOL- 
4,5-BISPHOSPHATC 
PHOSPHO D IESTER AS E 
DELTA I (PLC-DELTA- 1) 
(PHOSPHOLIPASE C-DELTA- 
1) (PLC-III) >gi|163538 
(M2063S) phospholipase C-III 
[Bos taurus] 


5e-36 


1708 


M93651 


Human set gene, 
complete cds. 


2e-42 


<NONE> . 


<NONE> 


<NONE> 


1709 


AJ236940 


Sus scrofa mRNA for 
hypothetical protein 
(5*: clone 7C4) 


2e-42 


2062403 


(U79010) delta 6 desaturase 
[Boraao officinalis! 


8.5 


1710 


J03634 


Human erythroid 
differentiation protein 
mRNA 


2e-42 


1708436 


INHIBIN BETA A CHAIN 
PRECURSOR 


2e-10 


1711 


AJ223777 


Mus musculus mRNA 
for striatin 


6e-43 


2494917 


STRIATIN 

>2i|1495773!anl!PLD|e254l58 


2e-32 


1712 


AF016411 


Homo sapiens 
potassium channel 
subunitKCNA3.lB 


2e-43 


270S514 


(AF0I641 1) KCNA3. IB [Homo 
sapiens] 


3e-13 


1713 


AC001443 


Homo sapiens 
(subclone 2_fl0 from 
BAC 2913 


le-43 


111S14 


hypothetical protein 3 - rat 
>ai|565S9 


2e-06 


1714 


X82S95 


Rsapiens mRNA for 
DLG2 


6e-44 


24975 1 1 


MAG UK P55 SUBFAMILY 
MEMBER 2 iMPP2 PROTEIN) 
<P | S rs T 3jpr.FHnMOI.OG 

2) 


6e-52 


1715 


U 17077 


Human BENE 
mRNA. partial cds. 


3e-44 


53912 


(X57960) ribosomal protein L7 
[Mus musculus] >2ic5489 


8e-30 


1716 


AJ222700 


Homo sapiens mRNA 
forTSC-22 protein 


2e-44 


<NONE> 


<NONE> 


<NONE> 


1717 


J03634 


Human erythroid 
differentiation protein 

mRNA 


2e-44 


124279 


INHIBIN BhlA ALHAiiSi 
PRECURSOR PROTEIN) 
(EDF)>gi|S7936|pir||B2424S 
inhibin beta- A chain precursor - 
human >gi|lSl947 (J03634) 
erythroid differentiation protein 
precursor [Homo sapiens] 
sapiens] 

>gi!22oS50|prt]!I60S260B 
inhibin betaA [Homo sapiensl 


0.73 



299 
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Nearest Neiehbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 


SEQ 
ID 


V- COO 1 KJ IN 


npcpRrpTION 


P VALUE 




DESCRIPTION 


P VALUE 
















1718 


ABO 145 18 


Homo sapiens mRNA 
for KIAAQo 1 3 
protein, complete cds 


7e-45 


1911548 


voOUOu**j Lyiocnrumc (.mikc 
polypeptide sapiens] 


1.6 


1719 


X76808 


H.sapiens genomic 
DNA clone d2 


7e-45 


868201 


(U29380) similar to adenylate 
cyclase [Caenorhabditis elegans] 


2e-09 


1720 


AB021288 


riomo sapiens iiuvi^ r\ 
for beta 2- 
microglobulin, 
complete cds 


2e-45 


| 246552 1 


(U95995) RNA-dependent RNA 
polymerase [Cryptosporidium 
parvuml 


0.15 


1721 X63468 


H.sapiens mRNA for 
transcription factor 
TFIIE alpha 


oe-*fO 


1 i 

<n Kjri £> 






1722 AFO 19226 


Homo sapiens D2-2 
mRNA, 3'UTR 


7e~46 


<NONE> 


<NONE> 


<NONE> 


1723 


D31764 


Human mRNA for 
KIAA0064 gene, 
complete cds 


2e~46 


3123050 


HYPOTHETICAL PROTEIN 
KIAA0064 


le-15 


1724 


K02774 


Human tvirik, ciabb ii 
HLA-DR-beta-psi 
(DW4/DR4) 
pseudogene, exons 
3,4, d,o, clones cosu- 
3301 and cosII-801. 


le-46 


4185946 


fV1"7QT/M (tut nmr^in fHnrmn 
[i i /oJ^+J ga£ proiciii Lnmuaii 

endogenous retrovirus K] 


2e-14 


1725 


X92109 


H.sapiens hcglX gene 


ye-4/ 


24Vo 1 oj 


BRIDG Of $£V£NLEsS 

PROTFIN PRECURSOR 

>gi| 1079 166;pir|| A47550 bride 
of sevenless precursor - fruit fly 
(Drosophila virilis) >gi|290216 
virt 1 is 1 


] 4 


1726 


X93334 


H.sapiens 

mitochondrial DNA. 
complete genuine 






NADH-UBtQUINONE 
OXIDOREDUCTASE CHAIN 
4 >gi|S6696|pir||A00435 NADH 
dphvdrooenase (ubiquinone) 


4e-15 


1727 


M85145 


Human tumor 
necrosis factor 
receptor. 3" flank. 


3e-47 


<N0NE> 


<NONE> 


<NONE> 


i 

1728 


X80240 


H.sapiens 
endogenous 
retrovirus HERV- 
KC4 DNA 


3e-47 


4185944 


(Y17S33) env protein [Human 
endogenous retrovirus K] 


7e-lS I 


1729 


Z63594 


H.sapiens CpG DNA. 
clone S7r9. forward 
read cp«S7f9.rtla . 


le-47 


3322743 


(AE00L222) T. pallidum 
predicted coding region TP0454 


\ 

2.4 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor fBlastX vs. Non- Redundant Proteins) . 


CCA 

ED 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






R.rattus mRNA for 










1730 


X62295 


vascular type- 1 
angiotensin II 
receptor 


4e-48 


1209756 


(U43629) integral membrane 
protein [Beta vulgaris] 


le-07 


1731 


M85145 


Human tumor 
necrosis factor 
receptor, 3' flank. 


3e-48 


<^NONE> 


<NONE> 


<NONE> 


1732 


AB020712 


Homo sapiens mRNA 
for KIAA0905 
protein, complete cds 


4e-49 


4240299 


(AB020712) KIAA0905 protein 
[Homo sapiens] 


2e-20 


1733 


AB020712 


Homo sapiens mRNA 
for KIAA0905 
protein, complete cds 


3e-49 


4240299 


(AB020712) KJAA09O5 protein 
[Homo sapiens] 


2e-20 


1734 


X62295 


R.rattus mRNA tor 
vascular type-1 
angiotensin II 
receptor 


le-49 


1209756 


(U43629) integral membrane 
protein [Beta vulgaris] 


7e-l2 


L735 


AJ007509 


Homo sapiens mRNA 
forElB-55kDa- 
associated protein 


Le-49 


3319956 


(AJ007509) ElB-55kDa- 
associated orotein 


4e-24 


1736 


X97303 


H.sapiens mRNA for 
Pte- 12 protein 


le-49 


466044 


H V PU I Hfc l 

FINGER PROTEIN ZK686.4 

IN CHROMOSOME III 

i>gi]6307S0|pii||S44909 ZIC606.4 

protein - Caenorhabditis elegans 

>gi|304346 iL 17337) coded for 

by C- elegans cDNAs 

GenBank:MSS369 and T01933; 

putative [Caenorhabditis 

eleaans] 
— * 


! 

i 

Se-3< 


1737 


AF038404 


Homo sapiens 
homolos of Nedd5 
(hNeddi) mRNA, 
complete cds 


4e-50 


<NONE> 


<NONE> 


<NON 


1738 


L43618 


Homo sapiens 
polycystic kidney 
disease (PKDl) gene, 
e.xons 35-42 


4e-50 


90375S 


(L43619) polycystic kidney 
disease 1 protein [Homo 
sapiens] 


3e-' - 


1739 


AF009424 


Homo sapiens clone 
22 mRNA. alternative 
splice variant alpha- 1. 
complete cds 


4e-50 


2271473 


( AF009426) clone 22 [Homo 
sapiens! 


*• 



WO 01/02568 PCT/US00/18374 





Nearest Neighbor ( BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












monosaccharid transport protein 




1740 


L77040 


Homo sapiens 
(subclone 8_cl t from 
PI H22) DNA 
sequence. 


2e-50 


99758 


STP4 - Arabidopsis thaliana 
>gi| 16524 (X66857) sugar 
transport protein [Arabidopsis 
thaliana] 


6.4 


1741 


L35657 


Homo sapiens 
(subclone H8 5_al0 
from PI 35 H5 CS) 
DNA sequence. 


2e-50 


2072960 


(U93568) p40 [Homo sapiens] 


2e-05 


1742 


■ 

U80745 


Homo sapiens CTG7a 
mRNA, partial cds 


Ie-50 


<NONE> 


<NONE> 


<NONE> 


1743 


D84514 


Bovine mRNA tor 
p97. partial cds 


Ie-50 


3978527 


(AF10372S) structural 
polvprotein [Sindbis virusl 


9.9 


1744 


M22960 


Human protective 
protein mRNA, 
complete cds. 


le-50 


131081 


Lysosomal protective 

PROTEIN PRECURSOR 
(CATHEPSIN A) 
(C ARBOX YPEPTIDAS E C) 
human >gi| 190233 iM22960) 
protective protein precursor 


le-L2 


1745 


X360I8 


H. sapiens mRNA for 
MUFI protein 


le-50 


1082610 


mufl protein - human 
>gi|7 62953 (X8601S) mufl 
[Homo sapiens] 


le-21 


1746 


U03495 


Human transcription 
factor LSF-ID 
mRNA, complete cds. 


7e-51 


2136296 


transcription factor LSF - human 
>gi|476099 


te-21 


1747 


ABO 15344 


Homo sapiens 
HRIHFB2I57 
mRNA. partial cds 


5e-51 


3970874 


(AB0L5344) HRIHFB2157 
[Homo sapiens] 


2e-35 


1748 


M93339 


Human zinc finger 
protein mRNA. 


4c-51 


3024110 


MYC- ASSOCIATED ZINC 
FINGER PROTEIN sapiens! 


2e-06 


1749 


U71363 


Human zinc finger 
protein zfp6 (ZF6) 
mRNA. partial cds 


4e-5l 


2689441 


(AC0O36S2) FIS547_I [Homo 
sapiens) 


2e*ll 


1750 


X56932 


H. sapiens mRNA for 
23 hishlv basic 
protein 


4c-5l 


730451 


60S RIBOSOMaL.PROTEIN 
LL3A (23 KD HIGHLY BASIC 
PROTEIN) 

>gi|345S97|ptr!;S29539 basic 
protein. 23K - human >gi|2369 I 
(X56932) 23 kD highly basic 
protein [Homo sapiens] 


le-ll 


. 1751 l 


279054 


H. sapiens flow-sorted 
chromosome 6 
HindlH fragment. 
SC6pA2lEll 


2e-5l 


<NONE> 


<NONE> 


<NONE> 



1>o ^ 



WO 01/02568 



PCTYUS00/18374 



.1 





Nearest Neishbor f BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ED 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Homo sapiens 










1752 


ArU6S245 


BAF60b gene, partial 
sequence 


c _ en 


<lN(JNh> 


<NUNE> 


<NONE> 


1753 




Sus scrofa mRNA for 
hypothetical protein 
(5 ; clone 4B8) 


c „ CO 

oeoz 


40U927 


RIBONUCLEOPROTEIN 
RB97D ribonucleoprotein 
[Drosophila melanoaaster] 


A "7 


1754 


AF003693 


Mus musculus 
scaffold protein Pbpl 
nomolog rnKiNA, 
complete cds 


6e-53 


2197106 


(A-rUUjoyj^ scarroia protein 
Pbpl homolog [Mus musculusl 


2e-54 


1755 


M27319 


Human calmodulin 
mRNA, complete cds. 


5e-53 


115528 


CALMODULIN 
>gi|102408|pir||JC1309 
calmodulin - Stylonychia lemnae 
(SGC5) >ai|L6I195 


0.002 


1756 


M745D3 


Mouse house-keeping 
protein mRNA, 
complete cds. 


5eoj 


284775 


house-keeping protein - mouse 
>gi|l9jS7l 


0e-jU 


1757 


X92720 


H.sapiens mRNA for 
phosphoenolpyruvate 
carbo.w kinase 


6e-54 


2135915 


phosphoenolpyruvate 
carboxykinase (GTP) (EC 
4. i.i. j. j precursor, 
mitochondrial - human 
carboxykinase (GTP) [Homo 
sapiens] 


6e-21 


T758" 


Ar 007872 


Homo sapiens torsinB 
(DQ1) mRNA, partial 
c3s 






(AB002405) LAK-4p [Homo 


U — .' 


L759 


U49507 


Mus musculus 
B6CBA Lisch7 
mRNA, partial cds. 


2e-54 


1236083 


(U49507) Lisch7 [Mus 
musculus] 


3e-27 


1760 


Z73360 


Li lim , in rvM a 

rlUmJn UlNrv 

sequence from 
cosmtd 92M18, 
BRCA2 aene resion 
cnromobome uqi_* 
13. 


leoD 


2370371 


(Y 14657) hydrophobin 
[Pleurotus ostreatus] 
>gi|29S2620|gnl|PID|e 1283986 

^AJ-^JuOl J run- 11} UI upnuui 11 

[Pleurotus ostreatusl 


2.0 


1761 


U83702 


Human cytochrome c 
oxidase subunit Via 
gene, exon 3 and 
complete cds 


8e-56 


• 

2982994 


(AE0006S2) hypothetical 
protein [Aquifex aeolicusl 


7.0 


1762 


Y127S1 


Homo sapiens mRNA 
for transducin (beta) 
like I protein 


7e-56 


j 

3021409 


(Y 1273 1) transducin (beta) like 
I protein [Homo sapiensl 


7e-?9 
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Nearest Neishbor ( BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION^ 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1763 


AB020673 


Homo sapiens mRNA 
for KJAA0866 
protein, complete cds 


8e-57 




2104553 


(AF001548) Myosin heavy 
chain(MHYll) (5'partial) 
[Homo sapiensl 


4e-04 


1764 


AJ236932 


Sus scrofa mRNA for 
hypothetical protein 
(5': clone 4B8) 


3e-57 


400927 


RIBONUCLEO PROTEIN 
RB97D ribonucleoprotein 
[Drosophila melanogasterl 


4.7 


1765 


L06900 


Human dystrophin 
gene, intron 1 
containing pseudo 
exon. 


lc-58 


. 4185129 


(AC005724) unknown protein 
[Arabidopsis thalianal thaliana] 


7.0 


1766 


X93334 


H saDiens 

mitochondrial DNA. 
complete senome 


9e-59 


' 1492050 


(U60315) MC107L [Molluscum 
contagiosum virus subtype 11 


0.17 


1767 


AF064856 


Rattus sp. 7acomp 
nrntein mRNA 
complete cds 


3e-59 


3169626 


(AF064856) 7acomp protein 
[Rattus sp-I 


2e-31 




1768 


AF081484 


Homo sapiens alpha- 

fnhiilin i^ofnrm t 

mRNA, complete cds 


2e-59 


32015 


(X06956) alpha-tubulin [Homo 
sapiens! 


4e-22 


1769 


X71427 


Homo sapiens mRNA 
fnr FUS-CHOP 
protein fusion 


le-60 


746557 


(U23523) histidine-rich 
[Caenorhabditis elegans] 


0.45 


1770 


AFO 13988 


Homo sapiens serine 
protease mRNA. 
complete cds 


le-60 


2564316 


(AB006622) No similarities to 
any reported proteins [Homo 
sapiensl 


0.26 


1771 


u — joy 1 


Mus musculus 
lymphocyte specific 
helicase mRNA, 

LUmpiCIC CLIO 


7e-6l 


2 1 37490 


lymphocyte specific helicase - 
mouse musculusl 


3e-25 1 


1772 


X93334 


H.sapiens 

mitochondrial DNA, 
complete senome 


4e-6L 


70656 


ubiquitin / ribosomal protein 
S27a - human extension protein, 
HUBCEPS0 [human. Peptide, 
156 aa] ubiquitin extention 
protein [Cavia porcellusl 


9e-0S 


1773 


D3S255 


Homo sapiens mRNA 
for CAB 1. complete 

cds 


4e-61 I 2135214 


sene MLN 6-t protein - human 


4e-23 


1774j U25691 


Mus musculus 
lymphocyte specific 
helicase mRNA. 
complete cds 


8e-62 1 


2137490 


lymphocyte specific helicase - 
mouse musculus] 


Se-26 1 
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Nearest Neighbor f BlasiN vs. Genbank) 


Nearest Neiahbor (Bias tX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1775 


M21731 


Human hpocortm- V 
mRNA. complete cds. 


6e-62 


3212603 


Human Annexin V With Proline 
Substitution By Thioproline 


2e-20 


1776 


AF021936 


Rattus norvegicus 
myotonic dystrophy 
kinase-related Cdc42- 
binding kinase 
MRCK-beta (MRCK- 
beta) mRNA, 
complete cds 


2e-62 


2736153 


(AF02 1936) myotonic 
dystrophy kinase-related Cdc42- 
binding kinase MRCK-beta 
(Rattus norvegicus] 


3e-27 


1777 


Y 12059 


H.sapiens HUNKI 
mRNA 


le-62 


3184498 


(AC004798) R3 1546.1 [Homo 
sapiens] 


3e-09 


1778 


L37368 


Human (clone E5.1) 
RNA-bindina protein 
mRNA. complete cds. 


6e-63 


477578 


sialidase - Actinomyces viscosus 
>ei|141852 


7.8 


1779 


M27S77 


Figure 2. Nucleotide 
and translated protein 
sequences of HPFl, - 
2, and -9. 


De-63 


L731443 


ZINC FINGER PROTEIN 83 
(ZINC FINGER PROTEIN 
HPFI) >gi| 106023|pir||A32891 
finger protein I, placental - 
human 


je-j j 


1780 


AF095448 


Homo sapiens 
putative G protein- 
coupled receptor 


2e-63 


3116131 


(AL02328S) hypothetical 
protein 


4.6 


1781 


L194j7 


Human transaldolase 

mRNA containins 
transposable element, 
complete cds 


2e-o3 


I rci i i a 

133 j 1 19 


(U63159) transaldolase [Mus 
muse u I us] 


j#» l <? 

•4C - 1 0 


1782 


L41351 


Homo sapiens 
prostasin mRNA, 
complete cds 


le-63 


2833277 


PROSTASIN PRECURSOR 
precursor - human >gi|862305 
(L4135 I) prostasin [Homo 
sapiens] >gi|l 143194 (U33446) 
prostasin [Homo sapiens] 


6e-14 


1783 


AF053470 


Homo sapiens lOkD 
protein (BC10) 
mRNA. complete cds 


6e-64 


482237 


hypothetical protein K03H1.9 - 
Caenorhabditis elegans 


0.029 



hoi 
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Nearest Neighbor f BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ J 

ID I ACCESSION 


DESCRIPTION 


P VALUE 


1 ACCESSION 


DESCRIPTION 


P VALUE 

















1784 


D3779I 


Mouse mRNA for 
beta- 1 ,4- 

gaiaciosy iirunMcitidC 




• 

388010^ 


TZ93390J SlmWtfl FYVE"zlhTT 
finger; cDNA EST yk265b4.5 
comes from this gene; cDNA 
EST yk359g9.5 comes from this 
gene; cDNA EST yk3L9c2.5 
comes from this gene 
[Caenorhabditis elegans] zinc 
finger; cDNA EST yk265b4.5 
comes from this gene; cDNA 
EST yk359g9.5 comes from this 
gene;cDNA EST yk319c2.5 
comes from this gene 

ffipnnrhnhHiti^ elecrnnsl 


3e-16 


1785 


AF015770 


Mus musculus radical 
fringe (radical- fringe) 
mRNA, complete cds 


6e-64 


2204*355 


(U94350) radical fringe 
precursor [Mus musculus] 


le-36 


1786 


Z79054 


H.sapiens flow-sorted 
cnrornosornc o 
Hindlll fragment. 
SC6pA2lEll 


2e-64 


<NONE> 


<NONE> 


<NONE> 


1787 


M83094 


Homo sapiens 
cytosotic selenium- 

Hf*fv»nrlpnt olnt/ifninnp 

peroxidase gene, 
complete cds. and 
rhoh 12 gene. 3' end. 


le-64 


2447063 


(U42580) A565R [Paramecium 
bursaria Chlorella virus 1| 


8.8 


1788 


Y10211 


H.sapiens LAG- 3 
gene, promoter region 


7e-65 | 


1944540 


(X141 12) tegument protein 
[human herpesvirus 1] 


2-3 


1789 


M 19045 


Human lysozyme 
mRNA, complete cds. 


2e-65 1 <NONE> 


<NONE> 


<NONE> 


1790 


• 

U01882 


Homo sapiens SS- 
A/Ro autoantigen 52 
kda component gene, 
complete cds 


2e-65 


585401 


LIPA^H MODLLaIOR 

PRECURSOR (LIPASE 
HELPER PROTEIN) 
>gi|4S0045jpir||S36249 lipB 
protein - Pseudomonas glumae 
>gi|49207 (X70354) helper 
protein 


4.2 


1791 1 


AF069517 


Homo sapiens RNA 
binding protein DEF- 
3 mRNA, complete 
cds 


2e-65 


3212101 


(AF069517) RNA binding 
protein DEF-3 [Homo sapiens] 


\ 

le-25 
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SEQ 
ID 



Nearest Neighbor 'BiastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



[Homo sapiens jerky 



1792 



gene product 
homolog rnRNA. 
AP004715 complete cds 



17931 X59652 



1794 U94350 



1801 



C. longicaudatus hprt 
mRNA for 
hypoxanthine 

Mus musculus radical 
fringe precursor 
ImRNA. complete cds 



Mus musculus 
putative 

lysophosphatidic acid 
acyltransferase 
17951 AFP 158 11 | mRNA. complete cds 

Cow 

phosphoinositide- 
specific 
hospholipase C 



2e-65 



3e-66 



3e-67 



3e-68 



1796[ J03I37 



Rattus norvesicus 

i 

putative peroxisomal 
2.4-dienoyl-CoA 

AKL) mRNA, 
17971 APQ44574 complete cds 



Mus musculus 
putative 

lysophosphatidic acid 
acvltransferase 



,17981 AFOiSSll ImRNA. complete cds 



3e-69 



le-69 



1799 X65\51 



M.musculus mRNA 
for desmoyokin, 
[partial 



18001 Z97207 



|Mus musculus mRNA 
for B-INDl protein 



U27196 



Gatfus gallus zinc 
ringer protein (Fzf-1) 
| mRNA. complete cds. 



4e-70 



5e-74 



2e-74 



6e-75 



Nearest Neighbor (BlastX v s . Non-Redundant Proteins^ 



ACCESSION 



DESCRIPTION 



2314829 



631625 



2204355 



2317725 



226908 



4105269 



23 17725 



109781 



2231019 



984814 



P VALUE 



(AF0O4715) jerky gene product 
homolog [Homo sapiens 1 | 2e-45 
hypoxanthine f guanine) 

phosphoribosy I transferase • long] 
tailed hamster 

phosphori bosy I transferase 
[Cricetulus longicauda ms] | 6e-54 



(U94350) radical fringe 
jrecursor [Mus musculus 1 



(AF0158 11) putative 
lysophosphatidic acid 
acyltransferase [Mus musculus 1 



phospholipase C 154 [Bos 
taurus] 



(AFQ44574) putative 
Deroxisomal 2,4-dienoyl-^o; 
eductase [Rattus norvegicusl 



(AF015311) putative 
ysophosphatidic acid 
acyltransferase [Mus musculus 



(U27196) zinc finder protein 
[Gallus gallus] gallus] 



2e03 



7e-51 



3c-25 



2e-33 



3e-19 



in - mouse (fragment) 
>gi|50675 | 9e-37 



(297207) B-INDI protein [Mus 
musculus] I 6e-2l 



2e-44 
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U Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) I 


SEQ 
ED 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












70 KD WD- REP EAT TUMOR- 




1802 


Y 15054 


Rattus norvegicus 
mRNA for 70 kDa 
mrnor specific 
antigen, partial 


3e-77 


3123027 


SPECIFIC ANTIGEN 
>gi|2505957|gnl|PID|e353992 
(Y 15054) 70 kD tumor-specific 
antigen [Rattus norvegicus] 


4e-42 


1803 


X65157 


M.musculus mRNA 
for desmoyokin. 
partial 


3e-79 


109781 


desmoyokin - mouse (fragment) 
>gi|50675 


9e-33 


1804 


U50736 


Rattus norvegicus 
cardiac adnamycin 
responsive protein 
mRNA. complete cds 


2e-84 


i 

1362781 


cytokine inducible nuclear 
protein CI 93 - human 
>gi|793841 (XS3703) nuclear 
protein ("Homo sapiens] 




1805 


AF072865 


Rattus norvegicus 
thioredoxin reductase 
(lrxK~) mKiNA.. 
nuclear gene 
encoding 

* t j i 

mitochondrial 
protein, complete cds 


2e-84 


3757888 


(AF07286D) thioredoxin 
reductase (Rattus norvegicus] 


6e-43 


1806 


AF044574 


Rattus norvegicus 
putative peroxisomal 
2,4-dienoyl-CoA 
reductase (DCR- 
AKL) mRNA, 
complete cds 


6e-85 


4105269 


(AF044574) putative 
peroxisomal 2,4-dienoyl-CoA 
reductase (Rattus norvegicus] 


le-41 


1807 


U19181 


Rattus norvegicus 
Rabin3 mRNA, 
complete cds. 


2e-S7 


624225 


(U19ISI) Rabin3 (Rattus 
norvegicus] 


2e-41 


1 SOS 1 


U40342 


Mus musculus ninein 
mRNA. complete cds. 


le-9l 


11 13865 


(U40342) ninein [Mus 
musculus] " 


2e-36 


1809 1 


X67877 


R, norvegicus mRNA 
for cytosoiic 
resinireratoxin- 
binding protein 


4e-92 1 


136077 


TROPOMYOSIN BETA 3, 
FIBROBLAST chicken 
>gip 1 D694 (M2 j082) 
tropomyosin [Gallus gallusj 


0.56 


1810 


AF044574 


Rattus norveaicus 
putative peroxisomal 
2,4-dienoyl-CoA 
reductase (DCR- 
AKLj mRNA. 
complete cds 


5e-93 


4105269 


(AF044574) putative 
>eroxisomal 2.4-dienoyl-CoA 
reductase [Rattus norvegicus] 


le-50 


1811 


AF035527 


Mus musculus EHF 
( Eh 0 mRNA. 
complete cds 


2e-95 I 


3138930 


(AF035527) EHF (Mus 
musculus] 


2e-47 



1>Q i 



WO 01/02568 



PCT/US00/18374 



Nearest Neiahbor CBlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



13 13 1 AB016930 



18131 AB005549 



18141 X90S49 



18151 S79873 



18161 U67203 



1817 L14684 



ISIS XS4692 



1819 U50736 



1820 S66855 



Cricetulus griseus 
mRNA for 
Phosphatidylglycerop 
hosphate synthase, 
complete cds 



Rattus norvegicus 
mRNA for atypical 
PKC specific binding 
protein, complete cds 



G.eallus PB1 eene 



iamp-2=lysosome- 
associated membrane 
protein-2 protein-2b 
(LAMP2) mRNA, 
alternatively spliced 
form h-lamp-2b, 
complete cds 



Mus muse ul us ACF7 
neural isoform 1 
(mACF7) mRNA, 
partial cds 



6e-96 



4159682 



7e-97 



3868778 



2e-97 



2134381 



3e-98 



<NONE> 



P VALUE 



(ABO 16930) 

Phosphatidylglycerophosphate 
synthase [Cricetulus griseusl 



7e-41 



(AB005549) atypical PKC 
specific binding protein [Rattus 
norvegicus] 



3e-41 



polybromo 1 protein - chicken 
chicken >gi[95!231 (X90849) 
polybromo I protein [Gallus 
gallus] 



le-34 



<NONE> 



<NONE> 



<attus norvegicus 
nuclear-encoded 
mitochondrial 
elongation factor G 
mRNA, complete cds 



Vl.musculus Spnr 
mRNA for RNA 
Ending protein 



Rattus norvegicus 
cardiac adriamycin 
responsive protein 
mRNA, complete cds 



HoxB9=Hox-2.5 
mice, embryos. 

mRNA Partial. 786 
t] 



2e-9S 



1675224 



e-100 



585084 



e-I33 



1363238 



e-113 



13627S1 



e-107 



I70S355 



(U67204) ACF7 neural isoform 
2 [Mus musculus] 



9e-39 



ELONGATION FACTOR G, 



PRECURSOR (MEF-G) 
>gi|5433S3|pir||S40780 
translation elongation factor G, 
mitochondrial - rat >ei|3 10102 



2e-30 



spermatid perinuclear RNA- 
binding protein Spnr - mouse 
>gi|673454 (X84692) spermatid 
perinuclear RNA binding 
protein [Mu.s musculus] 



5e-35 



cytokine inducible nuclear 
protein C193 - human 
>gi|793841 (XS3703) nuclear 
protein [Homo sapiensl 



2e-36 



HOMEOBOX PROTEIN HOX- 
B9 (HOX-2.5) 



Se-37 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



1821 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



P VALUE II ACCESSION 



HoxB9=:Hox-2-5 



1822 



[mice, embryos, 
mRNA Partial. 786 
S66855 jntl 



1823 



1824 



IRattus norvegicus m- 
Itomosyn mRNA, 
U92072 complete cds 



Mouse mRNA for 
kinesin-Iike protein 
D 17577 (Kiflb), complete cds 



Mus musculus SDP8 
AF062484 |rnRNA, complete cds 



DESCRIPTION 



1825 1 X73683 



18261 U6720: 



182' 



D 1 7577 



R. norvegicus mRNA 
for histone H3.3 



e-108 



1708355 



e-102 




3790389 



e-129 



e-122 



2497524 



3126981 



e-109 



122075 



Mus musculus ACF7 
neural isoform 1 
(mACF7) mRNA, 
partial cds 



e-102 



1675224 



Mouse mRNA for 
cinesin-Iike protein 
Kiflb), complete cds 



e-13I 



P VALUE 



HOMEOBOX PROTEIN HOX 
B9(HOX-2.5) 



(U92072) m-tomosyn [Rattus 
norveeicusl 



2497524 



KINESIN-LIKE PROTEIN 
KIF1B mouse 

>gi|407339|gnl|PID|dl005029 
(D17577) Kiflb (Mus 
musculus] 



(AF062484) SDPS [Mus 
musculus] 



(H3.3Q) histone H3.3 - fruit fly 
Drosophila melanogaster) 
histone H3.3B - chicken 
>gi|2II9023jpir||S612lS histone 
-13.3 - fruit tly (Drosophila 
hydei) 1-136) [Oryctolagus 
cuniculusj >gijS046 (X53822) 
Histone H3.3Q gene product 
[Drosophila melanogaster] 
>gi|5U9S gallusl>gi|161190 
(MI7S76) histone H3 [Spisuia 
solidissimaj >gi|2 11 853 
(MI 1393) histone 3.3 [Gallus 
gallus] >gi|306848 (Mi 1354) 
H3.3 histone [Homo sapiens] 
melanogaster] >gi|96303 1 
(XSI205) hisione H3.3 H3.3A 
variant [Drosophila 
melanogasterl musculus) 



'U67204) ACF7 neural isoform 
2 [Mus musculus] 



4e-37 



2e-38 



2e-39 



5e-40 



2e-40 



KINESIN-LIKE PROTEIN 
<IFIB mouse 

>gi|407339; 5 -J|PID|dI005029 
(DI7577) K::Ib[Mus 
musculus] 



2e-40 



7e-42 



?>(0 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor fBlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1828 | ABO 16930 



1829 U09S74 



1830 X99145 



18311 X99836 



P VALUE 



Cricetulus ?riseus 
mRNA for 
Phosphatidyl g J ycerop 
hosphate synthase, 
complete cds 

Mus musculus SKD3 
mRNA. complete cds 



C.familiaris mRNA 
for C3VS protein 



P.walti mRNA for 
rnp associated protein 
55 



1832 AF077003 



1833| AF060246 



Mus musculus SH3 
domain-containing 
adapter protein 
mRNA. complete cds 



Mus musculus strain 
C57BL/6 zinc finger 
protein 106 (Zfpl06) 
mRNA, H3a-a allele, 
complete cds 



e-131 



e-122 



e-110 



e-106 



e-121 



e- 1 18 



1834 1 2U030 



1835 AF077003 



1836 L20427 



R.norvegicus mRNA 
for TRAP-complex 
gamma subunit. 



Mus musculus SH3 
domain-containine 
adapter protein 
mRNA. complete cds 



Rattus norveaicus 
d i hydrox y po I yp re n y 1 b 
enzoate 

methyl transferase 
mRNA. complete cds 



e-120 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



4159682 



2493735 



(AB016930) 

Phosphatidyl glycerophosphate 
synthase [Cricetulus griseus] 

SKD3 PROTEIN SKD3 [Mus 
[musculus] 



1429314 



(X99145) overexpressed in 
thyroid tissue after TSH 
[stimulation [Canis farniliaris] 



(X99836) rap55 [Pleurodeles 
4200286 Iwaltl) 



3550240 



(AFO77O03) SH3 domain- 
containing adapter protein; 
CD2AP 



3372657 



|(AF060246) zinc finger protein 
106 [Mus musculus I 



1174453 



e-132 



3550240 



e-116 



457372 



"KAiVISLULUN- 
IASSOCIATED PROTEIN, 

GAMMA) (SIGNAL 
SEQUENCE RECEPTOR 
GAMMA SUBUNIT) (SSR- 
GAMMA) 

>gi|423l85|piri|S33294 
translocon-asiociated protein 
gamma chain - rat norvegicus] 



(AFO770O3) SH3 domain- 
containing adapter protein; 
CD2AP 



(L20427) 

dihydroxypoK prenylbenzoate 

met hvl transferase 

di hydrox ypoiy prenylbenzoate 

methyltransferase [Rattus 

norveeicus| 



3h 



P VALUE 



3e-43 



7e-48 



2e-50 



3e-5l 



le-52 



5e-54 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor f BlastN vs. Genbank) 



SEQ 

ID I ACCESSION 



1837 



DESCRIPTION 



X80169 



1838| AF080568 



1839 



X99145 



P VALUE 



Nearest Neighbor (BlasiX vs. Non- Redundant Proteins] 



ACCESSION 



M.musculus mRNA 
for 200 kD protein 



Rattus norvegicus 

CTPrphosphoethanol 

amine 

cy tid yl y I trans ferase 
mRNA. complete cds 



Cfamiliaris mRNA 
: or C3V5 protein 



3 an troglodytes breast 
and ovarian cancer 
susceptibility 
(BRCA1) gene, 
I840|| AF019075 [partial cds 



1841 



Bos taurus myosin X. 
U55042 [complete cds 



1842 



|Mus musculus mRNA 
for poly(ADP-ribose) 
AJ00778Q lpolymerase-2 



e-I22 



e-119 



e-121 



e-145 



e-122 



1843 



1844 



Rattus norvegicus 
thioredoxin reductase 
(TrxR2) mRNA, 
nuclear gene 
encoding 
mitochondrial 
AF072865 protein, complete cds 



Bos taurus myosin X, 
U55Q42 complete cds 



1845 



Mouse E46 mRNA 
X6 1506 [for E46 protein 



Bovine mRNA for 
GTP-binding protein 
1846 1 D90335 lalpha-subunlt 



1847 



Mus musculus 
B6CBA Lisch7 
U49507 [mRNA. partial cds. 



e-119 



e-105 



e-I21 



e-139 



e-US 



e-140 



1717793 



3396102 



1429314 



2218154 



1755049 



3283975 



3757888 



1755049 



114909 



DESCRIPTION 



585174 



2121326 



PROTEIN TSG24 (MEIOTIC 



(AF080568) 

CTP:phosphoethanolamine 
cytidvlvltransferase 



(X99145) overexpressed in 
thyroid tissue after TSH 
stimulation [Canis familiaris] 



(AFO05068) breast and ovarian 
cancer susceptibility protein 
splice variant [Homo sapiens] 



(U55042) myosin X [Bos 
taurus 1 



(AF072521* poly-(ADPribosyl) 
transferase homolos PARP 



(AF072865) thioredoxin 
reductase [Rattus norvegicus] 



(U55042) myosin X [Bos 
aurus] 



BRAIN PROTEIN E46 



LUANINH NUCLEOTIDE- — 
BINDING PROTEIN, ALPHA- 
4 SUBUNIT (GL1) 
>gi| 10871 l!piri|A40891 GTP- 
binding protein GL1 alpha chain 
bovine protein, alpha-subunit 
Bos taurus! 



AC002l2S;Lisch7[Homo 
sapiens] 



P VALUE 



CHECK POINT 
REGULATOR) 
>gi|1083553|pirl|A55117 tsg24 | 2e-56 



6e-58 



2e-5S 



leoS 



le-61 



4e-62 



3e-62 



le-62 



9e-6T 



2e-6? 



2e-74 



WO 01/02568 



PCT/US00/18374 



Table 4 





Ncaresi Neighbor f BiastN vs. Genbank) 


Nearest Neighbor i BlastX v S . Non-Redundant Protein* i 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALINE 


ACCESSION 


DESCRIPTION 


P V \l I fE 


I 


<NONE> 


<NONE> 


<NONE> 


<NONE> 






i 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


. > Vj* in c~* 




3 


<NONE> 


<NONE> 


<NONE> 


<NONE> 






4 


<NONE> 


<NONE> 


<NONE> 


<NONE> 






5 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<N'ONE> 




6 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<\ : ONE> 




7 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<VONE> 




8 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<N*ONE> 


<NOr>TFb. 


9 


<NONE> 


<NONE> 


<NONE> 


<NONE> 

* * 4W«^ 


<VONE> 


<MONTF-> 


10 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<\*ONE> 




II 


<NONE> 


<NONE> 


<NONE> 


<NONE> 

^* * h 1 4^BH"^ 


<\ONE> 


<NONFb. 


12 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


13 


<NONE> 


<NONE> 


<NONE> 


<NONE> 






14 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


15 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<\"ONE> 




16 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


17 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


IS 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<\*ONE> 


tfNJOMFs 


19 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


20 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<\"0?VF> 




21 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


22 


<NONE> 


<NONE> 


<NONE> 


<NONE> 


<\'ONE> 




23 


<NONE> 


<NONE> 


<NONE> 


1079469 


lMDC I protein - crab-eating 


9 t 


24 


<NONE> 


<NONE> 


<NONE> 


3043656 


(AB0IU3S) KIAA05&6 protein 
Homo sapiens 1 


9.3 


25 


<NONE> 


<NONE> 


<NUiVih> 


112175 


?otassium channel protein RK5 - 




26 


<NONE> 


<NONE> 


<NONE> 


37696^4 


AF091565) ol factor)- receptor 

RnttiK nnrv*«*cjirii i tl 


7 7 


27 


<NONE> 


<NONE> 


<NONE> 


3876443 


(281517) F2SB1.6 
Caenorhabditis elegans] 


7.1 


28 


<NONE> 


<NONE> 


<NONE> 


2224464 


(AB001684) ORf 249 [Chlorella 
vulgaris] 


6.9 


29 


<NONE> 


<NONE> 


<NONE> 


1519707 


(U67940) ORFvealOe; random 
cDNA sequence [Dictvostelium 
discoideum] 


6.7 


30 


* 

<NONE> 


<NONE> 


<NONE> 


227491 


protein kinase C 11 [Xenopus 
aevis] 


6.7 


31 


<NONE> 


<NONE> 


<NONE> 


630575 


C50C3.4 protein - 
Caenorhabditis eiegans 


6.0 


32 


<NONE> 


<NONE> 


<NONE> 


137200 


35 KD PROTEIN IN RNA2 
clover necrotic mosaic virus 
>*i|61466 (X0S0:n ORF for 35 
; Da polypep; i de t A A i - 3 IT/ 
Red clover necrotic mosaic 
virus] 


\ 

\ 

6.0 



3^3 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



33 



34 



37 



38 



40 



41 



42 



44 <NONE> 



45 



DESCRIPTION 



<NONE> 



<NONE> 



35 I <NONE> 



36 1 <NONE> 



<NONE> 



<NONE> 



39 I <NONE> 



<NONE> 



<NONE> 



<NONE> 



43 1 <NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



Nearer Neighbor (BlascX vs. Non-Redundant P^^T 



P VALUE | ACCESSION 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



<NONE> 




<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



30041 



2493585 



1001450 



(XI67I1) pid:g30Q4I [Homo 



P VALUE 



sapiens 



CELL DIVISION PROTEIN 
FTSW 



3182918 



140011 



3979943 



950203 



3560232 



730071 



2506641 



3511232 



1 150900 



(D63999) hypothetical protein 



NITROGEN REGULATORY 
PROTEIN AR EA 

Mitochondrial 

RIBOSOMAL PROTEIN S5 

Emericella nidulans 

mitochondrion (SGC3) 

>gi|12709 nidulans] >gi|472822 

(JO 1390) unknown protein 
^ALUJ4393) predicted using 

Genefinder; similar to WD 

domain, G-beta repeat; cDNA 

EST yk362f7.5 comes from this 

gene; cDNA EST yk362f7.3 

comes from this gene 

Caenorhabditis eleaansl 



'U31329) polyketide synthase 
[Aspergillus terreus] 



(AL031530) hypothetical zinc 
finger protein 

fSchizosaccharomyces pombel 



AXONE ME- ASSOCIATED 
PROTEIN MSTIOl(l) product 
[Drosophila hydei] 



HYPOtriETiCALil.7 Kb 
PROTEIN IN INTE-PIN 
INTERGENIC REGION 
>gi| 1 787402 (AE0002 14) orf, 
hypothetical protein 
[[Escherichia coli] 



(AF071556) anthranilate 
dioxygenase large subunit 



(U43I39) envelope glycoprotein 
gp!20 [Human 

immunodeficiency virus type II 



3876099 



[(Z75536) similar to dyne in 
heavy chain; cDNA EST 
EMBL:D27549 comes from this 
gene; cDNA EST 
EMBL:D34859 comes from this 

|gene [Caenorhabditis elegans] 



3H 



5.9 



5.7 



5.7 



5.2 



4.3 



4.0 



3.3 



3.0 



2.6 



2.5 



2.4 



1.9 



1.4 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



51 



Nearest Neighbor (BlastN vs. Genhanki 



ACCESSION 



47 I <NONE> 



48 J <NONE> 



49 | <NONE> 



50 I <NONE> 



<NONE> 



52 | <NONE> 



54 I <NONE> 



55 I <NONE> 



56 I <NONE> 



DESCRIPTION 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



Nearest Neighbor fBlasLX vs. Non-R^m^nr iw;..,T 



ACCESSION 



3881150 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



132200 
2204286 



1723955 



3201564 



2808721 



602434 



3347955 



1255587 



103076 



DESCRIPTION 



(AL032647) predicted usine 



P VALUE 



Gene finder 



UJLANIC ACID CAPSULAR 
BIOSYNTHESIS 
ACTIVATION PROTEIN A 
>gi|95605|pir||S 17701 rcsA 
rotein 

(U61380) germination protein 
Bacillus megateriuml 

PROTEIN IN FOXl-KEXl 
INTERGENIC REGION 
>gi|2132566|pir||S64222 
probable membrane protein 
YGL204c - yeast 
(Saccharomyces cerevisiae) 

>gi|1322838|gnl|PID|e243803 
(272726) ORF YGL204c 

[Saccharomyces cerevisiae] 



(AJ 0065 14) pro lipoprotein 
diacylglyceryl transferase 
[[Vibrio cholerae] 



(AL02142S) hypothetical 
[protein RvQ064 



(U 17986) GABA/noradrenaline 
transporter [Homo sapiens] 



(AF076184) cytosol ic sortina 

protein PA t'S- lb [Kattus 

Inorvegicus] 

coded tor by L" 

(elegans cDNA yk92b4.5: coded 

I for by C. elegans cDNA 

Iyk73al.5; coded for by C. 

elegans cDNA ykl02e9.5; 

coded for by C. elegans cDNA 

yk7|cS.5; coded for byC. 

elegans cDNA yk66dll.5; 

coded for by C. elegans cDNA 

vk66c3... 



L4 



Bkm-Iike sex-determinins 

J ^> 

region hypothetical protein 
CS3 14 - fruit fly (Drosophila 
melanoiiaster) 



107560 



Ras inhibitor (clone JC265) 
|httman sapiens] 



1.1 



L0 



0.84 



0.31 



0.27 



0.13 



0.12 



0.074 



o.oo; 



o.oo: 



^13 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor CBIastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



57 



58 



59 



61 



62 



63 



64 



65 



<NONE> 



<NONE> 



<NOHE> 



60 J <NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



66 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



Rat annexin V gene, 
D42 1 33 exon7 and exon8 



67 



L35679 



Homo sapiens 
(subclone H8 2_dll 
from PI 35 H5 C8) 
DNA sequence- 



Nearest Neighbor fBlastX vs. Non-Redundant Proteins) 



P VALUE S ACCESSION 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



5.0 



5.0 



103076 



2702370 



3859713 



2088839 



121059 



3875246 



1465 834 



3327136 



3880433 



<NONE> 



10S6902 



B km- like sex -determining 



P VALUE 



region hypothetical protein 
CS3 14 - fruit fly (Drosophiia 
melanogaster) 



(AF038604) contains similarity 
to Drosophiia ovarian tumor 
locus protein (GB:X13693) 
[Caenorhabditis elegansl 



(AL033501) phox domain 
protein [Candida albicans] 



(AF003386)F59E12.5 gene 
product [Caenorhabditis 

elegans] 

GC-R1CH SEQUENCE DNA- 
BINDING FACTOR GCF - 
human >gi|179412 (M29204) 
DNA-binding factor [Homo 

| sapiens] 

k/ai4yu> similar to WD 

domain, G-beta repeats (2 

domains); cDNA EST 

EMBL.T004S2 comes from this 

gene; cDNA EST 

EMBL:T00923 comes from this 

gene; cDNA EST yk449d4.3 

comes from this gene; cDNA 

EST yk449d4.5 comes from this 

[gen... 



(U64857) No definition line 
found [Caenorhabditis elegans] 



(ABO 14561) KIAA066I protein 
[[Homo sapiens] 



(Z66521) similar to 
mitochondrial RNA splicing ' 
MSR4 like protein; cDNA EST 
EMBL:C09217 comes from this 
[gene [Caenorhabditis elegans] 



<NONE> 



(U41-2/H) coded for byC. 

elegans cDNA yk79g8.5; coded 
'for by C. elegans cDNA 
IcmlOcS; coded for by C. elegans 
cDNA yk79g8.3; similar to 
| leucine: rich repeats found in 
I many proteins [Caenorhabditis 
jel egans] 



2e-04 



6e-05 



3e-05 



2e-08 



4e-09 



9e-24 



9e-28 



le-29 



8e-31 



<NONE> 



6.6 



WO 01/02568 



PCT/US00/18374 




SEQ 
ID 



68 



69 



74 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Rerfnnrfnm Proteins) 



ACCESSION 



HIV- 1 strain BX220 



from USA, envelope 
glycoprotein C2V3 
[region (env) gene, 

U90184 partial cds 

iHuman myosin Vila 
(MY07A) gene, 5" 
U61465 exon37 



70 AF013717 



71 I X58245 



Homo sapiens 
periplakin (PPL) 
rnRNA, partial cds 
(Soybean mRNA for 
HMG-1 like protein 



72 I ,AF 102425 



Frasera paniculata 
tRNA-Leu (tmL) 
gene, intron, 
chloroplast s 



uence 



PTPIOHCP- variant 
73 I X82817 sene 



U04827 



75 I AF038S59 



Mus muscuius brain 
fatty acid -binding 
protein . 



Neospora hughesi 
strain NE1 interna! 
transcribed spacer 1, 
complete sequence 



5.0 



5.0 



5.0 
5.0 



4.9 



4.9 



4.9 



4.8 



DESCRIPTION 



1297070 



2313225 



' 3719238 
2995363 



3522958 



3875514 



3676132 



P VALUE 



(Z71986) convicilin precursor 
[Vicia narbonensis] 



(AE000535) L-Iactate permease 
(IctP) [Helicobacter pylori 

26695] 

(AF064869) brain-enriched 
guanylate kinase- associated 
protein 2; BEGA2 [Rattus 
norvegicus] 

(AL022245) brofin synthase 



(AC004411) putative 

jec tines terase [Arabidopsis 

haliana] 



EMBL:D27474 comes from this 
gene; cDNA EST 
EMBL:D27473 comes from this 
gene; cDNA EST 
EMBL.T00471 comes from this 
gene; cDNA EST 
EMBL:D34192 comes from this 
SI 



<NONE> 



EMBL:D37241 comes from this 
ene; 

(AL031765) 1- 

evidence=predicted by content; 
l-method=genefinder;084; 1- 
method_score=31.96; 1- 
evidence_end; 2- 
evidence=predicted by match; 2 

match_accession=SPTREMBL: 
Q93319; 2- 

match - description=HYPOTHE 
TICAL PROTEIN C33A1 1 .2.;... 



<NONE> 



6.6 



5.0 



3.8 
0.99 



6.4 



2.8 



2e-09 



<NONE> 



2A1 



WO 01/02568 



PCT/US00/18374 




Nearest Neighbor (BlasiN vs. Genbank) 



76 



77 



78 



79 



80 



81 



82 



83 



ACCESSION 



I 



DESCRIPTION 



84 



Y08222 



AJ224475 



M.musculus MFH-I 



gene 



Borrelia burgdorferi 
left chromosomal 
subtelomeric region 
(pfpB gene) 



U02486 



Mus musculus LAP 
putative membrane 
protein (KRAG) 
gene, exon 3 and 
complete cds 



AB000280 



Rat mRNA for 
peptide/histidine 
transporter, complete 
cds 



Z4977 1 



A.cepa mitochondrial 
gene for NADH 
dehydrogenase 
subunit 3 and 
ribosomal protein 
S12 



M63494 



I Mouse IgG receptor 
(beta-Fc-gamma-RII) 
gene, exons 6 and 7, 
clones lambda- 
Fc(3.2,93). 



Z14Q35 S.pombe carl gene 



UI7I29 



Rhodococcus 
erythropolis ThcA 
(thcA) gene, complete 
cds; and unknown 
genes 



[Plasmodium 
[falciparum 
Ichromosome 2, 
section 23 of 73 of 
the complete 

AE001386 sequence 

jHuman clone 23734 
85 | U79292 mRNA sequence 



86 



Chloroplast Euglena 
sracilis gene codins 
for the 5S and 16S 
V00159 IrRNA. 



P VALUE 



4.8 



4.8 



4.8 



4.8 



4.5 



4.3 



2.0 



2.0 



2.0 



1.9 



1.9 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins"^ 



ACCESSION 



DESCRIPTION 



P VALUE 



<NONE> 



4218141 



3258103 



806317 



<NONE> 



<NONE> 



3790665 



2828280 



4176500 



<NONE> 



<NONE> 



(AJ236702)HMR1 protein 
[Antirrhinum majus] 



<NONE> 



<NONE> 



(AP000006) 367aa long 
hypothetical protein 
[Pyrococcus horikoshii] 



(M29067) unknown protein 
[[Saccharomyces cerevisiael 



8.3 



2.7 



0.001 



<NONE> 



<NONE> 



<NONE> 



(AF099000) No definition line 
found [Caenorhabditis elegans] 



(AL021687) putative protein 
[Arabidopsis thaliana] 

>gi|2832633|gnl|PID|e 124965 1 
(AL0217U) putative protein 
[Arabidopsis thaliana] 



<NONE> 



1.2 



2e-26 



AL03H77) dJ889M15.3 (novel 
protein) 



<NONE> 



9e-59 



<NONE> 



<NONE> 



1 <NONE: 



bit 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genhnnld 



88 



90 



91 



92 



93 



94 



95 



96 



97 




X93206 



89 | U60979 



X56272 



Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA. 

complete cds 

H.salinarium TATA 
box-binding protein 
enes and QRF s 

Caenorhabditis 
elegans programmed 
cell death specifier 
(ces-2) gene, 
complete cds 



DESCRIPTION I P VALUE ACCESSION 



Nearest Neighbor (BlastX vs. Non-Redundant ProteinTT 



1.9 



1.9 



C. tentans ORFs (A- 
i) for hemoglobin 



L22383 



U82814 



U 18504 



X53676 



Homo sapiens DNA 
sequence, repeat 

region. 

Hirudo medicinalis 
neuron-specific 
protein mRNA, 
complete cds 



1.9 



1.9 



1.9 



1.9 



Haplomitrium 
hookeri 1 8S rRNA 
gene, partial 

sequence. 

Pseudomonas stutzeri 

DFY genes 
involved in copper 
processing 



1.9 



1.9 



U60086 



U33447 



MS 1327 



Dictyostelium 
discoideum multidrug! 
esistance 
transporter/Ser 
protease (tagC) 
mRNA. complete cds. 



1.9 



Human putative G 
protein-coupled 
receptor (GPR17) 
gene, complete cds 



Sus scrofa lactofcrrin 
mRNA. complete cds 
>:: gb|I2S42 1 (128421 
Sequence 5 from 
patent US 5571691 



1.9 



l.S 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



3822533 



(AF094531) immunoglobulin 
heavy chain precursor 



2.0 



1083969 



hypothetical protein 6 - fowlpox 
[virus virus] 



2980781 (AL022 198) putative protein 



3879530 



(249130) cDNA EST 
yk486b9.3 comes from this 
gene; cDNA EST yk486b9.5 
comes from this gene 



3880034 



(Z75550) similar to cell division 
control protein [Caenorhabditis 
[elegans] 



2.0 



0.70 



6e-05 



7e-14 



<NONE> 



<NONE> 



\ 



<NONE> 



WO 01/02568 



PCT/US00/ 18374 



SEQ 
ID 



Nearest Neighbor fBlastN vs. Genbank) 



ACCESSION 



98 



I 



DESCRIPTION 



YQ7622 



S.iniae IctP & IctO 



genes and ORF1 



99 



M60474 



100 Y13901 



101 



U44400 



102 



U92808 



103 



Mouse myristoylated 
alanine-rich C-kinase 
substrate (MARCKS) 
mRNA. complete cds 



Homo sapiens FGFR- 
4 «ene 



Human Down 
Syndrome region of 
chromosome 21, 
clone A31D6-1D6. 



Juminococcus albus 
beta-glucosidase 
(gluA) mRNA, 
complete cds 



L2505 1 



104 I AE000546 



105 I J00978 



106 



107 



Candida albicans 
argininosuccinate 
yase (ARG4) gene, 
complete cds. 



Helicobacter pylori 
26695 section 24 of 
134 of the complete 
genome 



Xenopus laevis major 
beta-globin gene, 
complete cds. 



U4I716 



108 



X66286 



U76636 



109 J00664 



110 



M21535 



Human 
immunodeficiency 
virus type 1 isolate 
JW95-5, vpr gene, 
complete cds 



G.gallus mRNA for 
tens in 



Xenopus ca I bind in 
D28k mRNA, 
complete cds 



rabbit embryonic beta 



4-globin sene. 



-luman erg protein 
(ets-related gene) 
mRNA. complete cds 



Nearest Neighb or (BlastX vs. Non^Rerf nnHrmi p~ 




WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genhankl 



ACCESSION 



DESCRIPTION 



111 I M80329 



112 I D37887 



113 I AF019763 



1 14 I AF025967 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant ProTd^T 



ACCESSION 



Rat troponin 1 
cardiac isoform gene 
complete cds 



Cyprinus carpioc- 

myc gene for c-Myc, 

c omplete cds 
omo sapiens 

protein-coupled 

receptor kinase 1 and 

G protein- coupled 

receptor kinase lb 

(GRKl)gene, 

alternatively spliced, 

alternative exon 6, 

exon 7, and partial 

cds 



DESCRIPTION 



1.8 



999450 



1.8 



3023408 



Helicobacter pylori 
J166 virulence 
regulon 
transcriptional 
activator homolog 
gene, partial cds, 
st rain- specific 
genomic sequence B2 



1.8 



498643 



»15 | U13133 



Xenopus laevis 
(Xwnt-4) mRNA, 
complete cds. 



1.8 



3850108 



1.8 



2494853 



KZ46595) incomplete interleukin 
1 1 receptor isoform [Homo 
|sapiens] 

BRANCHED-CHAIN AMINO 
ACID TRANSPORT SYSTEM 
CARRIER PROTEIN 
(BRANCHED-CHAIN AM [NO 
ACID UPTAKE CARRIER) 
>gi|1075007|pir||D64056 
membrane- associated 
component, branched amino 
acid transport system (bmQ) 
homolog - Haemophilus 
influenzae (strain Rd KW20) 
system II carrier protein (bmQ) 
[Haemophilus influenzae Rdl 



P VALUE 



(U10270) G-box binding factor 
[1 [Zea mays] 



(AL033388) putative calcium- 
transporting atpase 

[Schiz osaccharomvces pombel 
PKUBAHLk 1 K ~ 



HYDROXY ACYLGLUTATHI 
ONE HYDROLASE 
(GLYOXALASE II) (GLX II) 
protein [Escherichia coli] 
>gi| 1786406 (AE000130) 
probable 

hydroxyacylglutathione 
hydrolase [Escherichia coli] 



7.3 



7.2 



7.2 



5.7 



5.5 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlasiN vs. Genbank) 



SEQ 

IP I ACCESSION 



116 



117 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



P VALUE | ACCESSION 



S68944 



118 



M92905 



X 12429 



119 



120 



Na+/Cl(-)-dependent 
neurotransmitter 
transporter 



Rat calcium channel 
alpha- 1 subunit(rbB- 
I) mRNA, complete 
cds. 



Xenopus laevis Ul 
70K gene exon 10 



D83333 



Mouse hepatitis virus 
genomic RNA for 
spike protein, partial 
cds 



AFO 16972 



121 AB010741 



122 



U32844 



Cervus elaphus 
REDDEER 
mitochondrial D- 
loop, complete 
sequence 



Oncorhynchus mykiss| 
mRNA for rtSox24, 
complete cds 



Haemophilus 
nfluenzae Rd section 
159 of 163 of the 
complete genome 



1.8 



1.8 



1.8 



DESCRIPTION 



2276316 



3165522 



2735957 



1.8 



3876559 



1.8 



3878057 



1.8 



1730805 



1.8 



P VALUE 



(296810) GLYT-l LIKE [Home 
sapiens] 



72S910 



(AF067607) Similar to cuticular 
collagen; C18H7.3 



(AFO 15685) reverse 
transcriptase domain protein 



nrrr 



ft 

cyclin A/CDK2-associatd 
protein P19 (RNA polymerase 
elongation factor) 
(SW;SKPI_HUMAN); cDNA 
EST EMBL:T001 14 comes 
from this gene; cDNA EST 
yk390fl 1.5 comes from this 
gene; cDNA EST yk402e 1 1 .5 

CO... 

>gi|3877216|gnl|PID|e!346850 
protein P19 (RNA polymerase 
elongation factor) gene; cDNA 
EST yk390fll.5 comes from 
his gene; cDNA EST 
yk402ell.5co. 



A-TYPE INCLUSION 
PROTEIN (ATI) camelpox 
virus >gi|62381 (X69774) 
34kDa A-type inclusion protein 
unidentified! 



5.5 



5.5 



3.3 



3.3 



(Z99942) similar to von 

Willebrand factor type A 

domain; cDNA EST yk412d4.5 

comes from this gene; cDNA 

EST yk412d4.3 comes from this 

gene | 3 o 

HYPUlHhllLAL^l.U KB 

PROTEIN IN RPS3-PSD1 

INTERGENIC REGION 

>gi|2l32762|pir||S63129 

probable membrane protein 

YNLI74w - yeast 

(Saccharomyces cerevisiae) 

>gijl302 l52|gnl|PIDje239548 

(Z7145I)0RF YNL174w 

Saccharomyc es cerevisiae] | 2.5 



1.9 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor fBlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



123 | U18321 



126 



127 



130 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Pmt^T 



ACCESSION 



Human ionizing 
radiation resistance 
conferring protein 
mRNA. complete cds 



124 I M28668 



Human cystic fibrosis 
mRNA. encoding a 
presumed 
transmembrane 
conductance regulator 
(CFTR). > :: 
gb|I11500|I 11500 
Sequence 1 from 
Patent US 5407796 



1.8 



1.8 



125 I AF064553 



Mus musculus NSD1 
protein mRNA, 
complete cds 



Human mRNA for 
KIAA0316gene, 
AB002314 complete cds 



(subclone 10_d2 from 
PI H21) DNA 
L42096 | sequence. 



1.8 



1.8 



JR.norvegicus renin 
128 1 M37278 1 gene, exons 1-9. 



129 I ' X82879 



Artificial sequences 
[DNA for ART 2 
iconsensus 



D89729 



Homo sapiens mRNA 
| for CRM I protein, 
[complete cds 



131 | U7S076 



Mus musculus 
sepiapterin reductase 
[gene, exons 1 and 2 



1.8 



1.3 



1.8 



IS 



1.8 



DESCRIPTION 



2133273 



90492 



2501207 



115131 



2135624 



4050087 



310929 



3559944 



2984225 



P VALUE 



ribosomal protein YS7 homolog 
Emericella nidulans 



filaggrin precursor - mouse 
(fragment) 



(fragment) 

PkOBABU PROTEIN 



DISULFIDE ISOMERASE P5 
PRECURSOR >gi 1 1065461 
(U40411) Similar to protein 
disulfide-isomerase. 
Caenorhabditis eleeans] 

IE<jUlaTOrY PROTEIN — 

BRLA (BRISTLE A PROTEIN) 
>gi|83718!pir||A28913 
regulatory protein brlA - 
Emericella nidulans >gi| 168029 
(M20631) brlA protein 



metal loproteinase I (EC 3.4.24. 
■ human 



(AF 109907) SI 64 [Homo 
sapiens] 



(LI 3442) cysteine-rich extensin 
ike protein-4 [Nicotiana 
(abac urn] 



(AJ010792) MucSAC protein 
[Mus musculus] 



(AE000766) enolase- 
phosphatase E-l [Aquifex 
aeolicusl 



1.4 



0.87 



0.87 



0.84 



0.65 



0.58 



0.52 



0.38 



0.095 



^3 



WO 01/02568 



PCT/US00/18374 



SEQ 



Nearest Neighbor (BlastN vs. Genbank) | Nearest Neighbor rm , SLX vs. Non-Redundant p^ w „ 



IP ACCESSION! DESCRIPTION IpvAnrF ACCESSION 



132 



Paramecium 168G 
gene for 168G 
X52133 [surface protein 



133 



134 



Human desmoplakin Ij 
M7783Q IraRNA, complete cds. 



Plasmodium berghei 
AJ224150 EF- 1 alpha A-gene 



135 



Mus musculus 
(somatostatin receptor 
12 gene, exonl and 5" 
AJ005518 flanki ng res ion 



136 



IKalstonia eutropha 
megaplasmid pHGl 
nitric oxide reductase 
(norB) gene, 
AFQ02217 complete cds 



137 1 AF039035 



138 



M81769 



139 1 YI 1 106 



Caenorhabditis 
elegans cosmid 
C53A3 



S.domesticus 
immunoglobulin 
rearranged gamma 
chain mRNA, VJC 
egion. complete cds. 



P. pastor is PYC1 gene 



140 J US7803 



-iuman putative 
Ca2+/calmoduIin- 
c ependent protein 
kinase kinase gene. 3' 
flanking region, 
partial sequence 



1.8 



1.8 



1.8 



1.8 



1.8 



1.8 



1.8 



115316 



1397246 



1353761 



1326350 



3393018 



3S50109 



30S0527 



DESCRIPTION 



P VALUE 



CHAIN PRECURSOR 
(ENDOTHELIAL 
COLLAGEN) 

gi|105686|pir||S 15435 collagen 
alpha l(VIII) chain precursor - 

(U6W44) ceded Tor by C! 

elegans cDNA ykl 12f3.5; coded 
for by C. elegans cDNA 
cm2Id2; coded for by C. 
elegans cDNA CEESR07F; 
coded for by C. elegans cDNA 
ykl 12f3.3; coded for by C. 
elegans cDNA CEESR29F 
[Caenorhabditis elegans 1 



(U43 192) myosin II heavy chain 
[Naegleria fowleri] 

(U58748) similar to potential 
transmembrane domains in S. 
cerevisiae nulcear division 
*FTI protein (SP:P38206) 



2e-05 



AL031174) hypothetical 
srotein 



2e-08 



1175412 



(AL033388) 3-oxoacyl-[acyl- 
carrier-proteinl-synthase 



(AL022600) putative mannose-1 
phosphate gaunyl transferase 
Schizosaccharomyces pombe] 



HYPOTHETICAL 24.2 KD 
PROTELVC13AI1.03 IN 
CHROMOSOME I >gi|984224 
(Z54096) unknown 



3e-14 



2S2S280 



AL0216S7) putative protein 

Arabidopsis thaliana] 
>gi|2S32633|gnl|PID|e 1 24965 1 
(AL02171 1) putative protein 
[ Arabidppsis thaliana] 



le-15 



3e-17 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN v S . Genbank) 



ACCESSION 



DESCRIPTION 



ilLl AE001430 



142 I L19708 



Plasmodium 
falciparum 
chromosome 2, 
section 67 of 73 of 
the complete 
sequence 



P VALUE 



Nearest Neighbor (BlastX vs. Non.Reri.mHnnr p rot ei 



ACCESSION 



Rat N-methyl-D- 
aspartate receptor 
(NMDAR1) gene, 
irst exon. 



143 I Y 10728 



144 I AB006631 



145 I AF 1 06967 



146 1 AE001Q73 



Mus musculus 13 
protein mRNA, 
complete cds 



Archaeoglobus 



ulgidus section 34 of | 
172 of the complete 
genome 



147 I u 1 2977 



■Mi 



148 I M27038 



J seuaomonas 
emoignei poly(3- 
hydroxybutyrate) 
depolymerase A 
precursor (phaZ5) 
gene, complete cds, 
and giycerol-3- 
phosphate- 
dehydrogenase 
homolos. complete 

cds. 

VIus musculus 

(SK/CamRk) 
germline [gK chain 
gene. J I -5 region. 



ns) 



DESCRIPTION 



P VALUE 



1.8 



1931647 



1.8 



1731181 



schwa rzi 
mitochondrial cytb 
ene, partial | 1,8 



Homo sapiens mRNA| 
for KIAA0293 gene, 
partial cds | 1.8 



3878644 



4176500 



1.7 



<NONE> 



1.7 



<NONE> 



1.7 



<NONE> 



1.7 



<NONE> 



(U95973) endomembrane 

protei n EMP70 precusor i so log 
H^UlRbl lCAL^lO) F 



2e-20 



3e-2I 



PROTEIN C14A4.3 IN 
CHROMOSOME II 
>g i 1 3 8 74230|gn l|PED|e 1351618 
protein (Swiss Prot accession 
number P38376); cDNA EST 
yk220el0.5 comes from this 

,^ ene [Caenorhabditis elegans] 
(ZillOj) predicted using 
Genefmder; cDNA EST 
yk303gll.5 comes from this 
gene; cDNA EST yk303g 1 1 .3 
comes from this gene 
Caenorhabditis elegans 1 I le-28 



(AL031177) dJ889M15 -j (novel 
P^tein) 7^45 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



\ 



<NONE> 



3 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbunk) 


Nearest Neiaht 


>or (BlastX vs. Non-Redundant Proteins) 1 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






H.sapiens HBF-1 










149 


X74 142 

1 


mRNA for 
transcription factor 


1.7 


<NONE> 


<NONE> 


<NONE> 


150 


U40830 


Mreptococcus 
thermophilus DeoD 
gene, partial cds and 
EpsA, EpsB. EpsC, 
EpsD, EpsE. EpsF, 
EpsG, EpsH, EpsI, 
EpsJ, EpsK, EpsL, 
EpsM, Orf 14.9 
grotein genes, 
complete cds 


1,7 


t 

1 

<NONE> 


<NONE> 


<NONE> 


151 


L29172 


Rabbit Ig germline 
gamma H-chain 
(allotype dl2,el5)C- 
region gene. 3' end. 


1.7 


<NONE> 


<NONE> 


<NONE> I 


152 


M 19045 


Human lysozyme 
mRNA, complete cds. 


1.7 


<NONE> 


<NONE> 


<NONE> 


153 


AE001I59 


Borrelia burgdorferi 
(section 45 of 70) of 
the complete senome 


1.7 


<NONE> 


<NONE> 


<NONE> 


154 


LI 7027 


Plasmid pFdA (trom 
Fremyella 
diplosiphon) DNA 
sequence, including 
unidentified cds and 
stem loop. 


1.7 


' <NONE> 


<NONE> 


<NONE> 


1 ^ 
1 J J 


UI2232 


Arabidopsis thaliana 
Columbia GTP 
binding protein beta 
subunit (AGB1) 
mRNA, complete cds. 


• 

1.7 


* 

<NONE> 


<NONE> 


<NONE> 


156 


D42056 < 


Arabidopsis thaliana 
ATPK6 mRNA for 
ribosomal- protein S6 
dnase homolog. 
complete cds 


1.7 


<NONE> 


<NONE> 


<NONE> 


_L 57 


X9SM7 i 


^hizobium 
eguminosarum prsD, 
prsE, ORF3 genes 


1.7 J 


<NONE> 


<NONE> 


<NONE> 



'hue 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-ReriimHnnr Prr^inc* 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















158 


AF039084 


Spinacia oleracea 
heat shock 70 protein 
protein, complete cds 


1.7 


<NONE> 


<NONE> 


<NONE> 


159 


Z12651 


R.norvegicus gene for 
catechol 

methv (transferase 


1 7 


<NONF^ 




<INUfNh> 


160 


AF002968 


Fringilla coelebs 
mitochondrial control 
region; partial 
sequence 


1.7 


<NONE> 


<NONE> 


<NONE> 


161 


AE001160 


Borrelia burgdorferi 
(section 46 of 70) of 
the complete genome 


1.7 


<NONE> 


<NONE> 


<NONE> 


162 


U67553 


Methanococcus 
jannaschii section 95 
of 150 of the 
complete genome 


1.7 


<NONE> 


<NONE> 


<NONE> 


163 


M86247 


S.mminantium 
plasmid pS23 DNA. 


1.7 


<NONE> 


<NONE> 


<NONE> 


164 


S74436 


o!L-8=inter!eukin-8 
[sheep, spleen cells, 
mRNA 1435 ntl 


1 7 






<lNUlNt> 






Candida maltosa 
ALK7 (CYP52A10) 
and ALKS complete 




<NONE> 


<NONE> 


<NONE> 


166 


U02625 


Geotrichum 
candidum NRRL Y- 
553 lipase gene, 
partial cds. 


1.7 


321245 


230k bullous pemphigoid 
antieenBPMl - mouse 


9.3 


167 


Z5888 1 


H.sapiens CpG DNA, 
clone 1 14a4, reverse 
read cpgl I4a4.rtla . 


1.7 


1854675 


'U66298) bone morphogenetic 
)rotein-6 [Rattus norvesicus] 


9.1 


16S 


U43674 


Asrobacterium 
tumefaciens conjugal 
transfer region 1 
aenes 


1.7 


1352066 


LAKbh ^KULlNb-KlCH 
PROTEIN BAT2 MHC class III 
histocompatibility antigen HLA- 
3*associated transcript 2 - 
human >gi|179339 (M33509) 
rILA-B-associated transcript 2 
(BAT2) [Homo sapiens] 
>gi|179345 (M33518) HLA-B- 
associated transcript 2 (BAT2) 
Homosapiens] 


9.1 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor f BlastN vs. Genbank) 



SEQ 
ID 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins 



ACCESSION 



169 



- 



Caenorhabditis 
elegans cosmid 
Y12A6A, complete 
sequence 
[Caenorhabditis 
ALQ23827 leleeansl 



170 



XJaevis mRNA for 
glutathione 
synthetase, large 
X69662 subunit 



171 



S. cerevisiae 
chromosome II 
I reading frame ORF 
Z35824 YBL063w 



172 



■- 



M65139 



173 



X15065 



Cowpea chlorotic 
mottle virus (CCMV) 
la protein gene, 
complete cds. 



Drosophila distal BX 
C region (bi thorax 
complex) pH189 5' 



region; 



1.7 



731440 



1.7 



4038057 



1.7 



3021450 



1.7 



2506307 



.1.7 



1723625 



DESCRIPTION 



PRU I UHUKPH i KINUObiV 



yeast 

(Saccharomyces cerevisiae) 
>gi|603606 (U 18778) Heml4p: 
protoporphyrinogen oxidase 
[Saccharomyces cerevisiae] 
>gi| 1403536|gnI|PID|e249333 
(Z71381) protoporphyrinogen 
oxidase [Saccharomyces 
cerevisiae] 



(AC005897) hypothetical 
protein [Arabidopsis thaliana] 



(YI55I5) prdl-a [Hydra 
vulgaris] 



iga 

"LULLACjkN ALPHA 1(X11) 
CHAIN PRECURSOR 1 (XII) 
chain - chicken 

>gi|222811|gnl|PID|dl00I160 
gallus] 

>gi|2326442|gnl|PID|e39435 
(X6I024) collagen type XII 
alpha 1 chain [Gallus sallus] 
HVK)THtTll r ALltfi)KD 
PROTEIN IN ALPA-GABD 
INTERGENIC REGION (FS7) 
>gi|I033I24 (U36840) 
ORF_f87 [Escherichia coli] 
>gi| 1788982 (AE000348) orf, 
lypothetical protein 



P VALUE 



8.9 



8.8 



7.0 



7.0 



6.9 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Gcnhankl 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins! 



ACCESSION 



174 



S.cerevisiae 
[chromosome VI 
Z46255 lambda clone. 



Human CD4 
promoter, partial 
175 | UP 1Q66 sequence. 



176 I U34743 



177 U 14662 



178 



Homo sapiens 
PMS2L15 mRNA, 
ABO 17006 [partial cds 



179 



180 



U92651 



Brassica oleracea var. 
botrytis tonoplast 
intrinsic protein 
bobT!P26-l mRNA, 
[complete cds 



Lytechinus variegatus 
(notch homolos 
AFQ0Q634 (mRNA, complete cds 



DESCRIPTION 



1.7 



3875228 



Phalaenopsis sp. 
'hybrid SM9I08' 
homeobox protein 
[mRNA, complete cds 

laboon ncrpesvirus 
IHVP2 gB 

glycoprotein (UL27) 
gene, complete cds. 



1.7 



125448 



1.7 



1022918 



1.7 



1.7 



3218378 



1465855 



1.7 



3023675 



P VALUE 



(246792) similar to lethai(l) 
discs large- 1 tumor suppressor 
protein-like repeats; cDNA EST 
EMBL:D33495 comes from this 
gene; cDNA EST 
EMBL:D35 1 17 comes from this 
gene; cDNA EST 
EMBL:D36356 comes from this 
gene;cDNA EST EMB... 
>gi|3879984|gnl|PID|el 35 1 767 
suppressor protein-like repeats; 
cDNA EST EMBL:D33495 
comes from this gene;cDNA 
EST EMBL:D351 17 comes 
from this gene; cDNA EST 
EMBL:D36356 comes from this 
gene; cDNA EST EMB... 



THYMIDINE KINASE 
saimiriine herpesvirus 1 (strain 
11 [One]) >.gi|60341 



1.7 



148574 



DYNE IN HEAVY CHAIN, 
CYTOSOLIC (DYHC) dynein 
heavy chain 

'Schizosaccharomyces pombe] 



(M58520)endo-l,4-beta- 
glucanase [Fibrobacter 
succinpgenesl 



6.7 



6.7 



(U38184) ATPasesubunit6 

f r 

[[Trypanosoma cruzi] | 6.7 

(AL023862) hypothetical 
protein SC3F9.07 [Streptomycesl 
coelicolor] | 6.7 



(U64859) glutamine-rich protein 
[Caenorhabditis elegans] | 6.7 



6.6 



6.6 



WO 01/02568 
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I Nearest Neighbor f BlastN vs. Genbank) 



SEQ 
ID 



ACCESSION 



DESCRIPTION 



181 



182 



183 



M92354 



P VALUE 



Arabidopsis thaliana 
anthranilate synthase 
alpha subunit gene, 
complete cds. 



AJ234856 



U76827 



184 



Hordeum vulgare 
genomic DNA 
fragment; clone 
MWG2234.rev 



Stercorarius 
parasiticus bird J33 
cytochrome b protein 
partial cds 



U05211 



185 



AF076974 



186 AE0O0753 
— 



187 



Saccharomyces 
cerevisiae Ttplp 
(TTP1) gene, 
complete cds. 



Homo sapiens 
TRRAP protein 
(TRRAP) mRNA, 
complete cds 



188 



AF0O5638 



M23090 



189 



(90 



M24001 



X59964 



Aquifex aeolicus 
section 85 of 109 of 
he complete genome 



Tupaia glis 
apolipoprotein AI 
prepropeptide 
mRNA, complete cds 



Human germline IgK 
chain gene V3-region, 
clone Humkv328h5 



Mink enteritis virus 
antigenic type 2 
capsid protein genes 
VP1 and VP2, 
complete cds. 



H.sapiens CST4 gene 
for Cystatin D 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins! 



ACCESSION 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



738308 



3142302 



3413810 



403173 



1170140 



1-7 | 1169357 



3355682 



2257483 



DESCRIPTION 



2143504 



blue light photoreceptor 
[Arabidopsis thaliana] 



P VALUE 



( AC00241 1) Strong similarity to] 
myosin heavy chain gb|234293 
from A. thaliana. [Arabidopsis 
thaliana] 



(Y 17034) Bassoon [Mus 
musculus] 



DNA ADENINE METHYLASeI 
site-specific DNA- 
methyltransferase (adenine- 
specific) dam methylase gene 
product [Vibrio cholerae] 



(AL031 124) putative secreted 
[lyase 



(AB004534) pi003 
'[Schizosaccharomyces pombe] 



myotonic dystrophy kinase - 
mouse (fragment) kinase, DM- 
kinase {C-terminal, alternatively] 
spliced, clone delta II.IILIV,V J 
|[mice. brain, Peptide Partial, 
1474 aa] [Mus sp.] 



1766075 



(U37273) winged helix protein 
lcvVH-2 [Gallus gallus] 



2j ^® 



6.5 



6.5 



5.4 



(L24492) lipoprotein 

[Rhodococcus erythropolis] | 4.9 
PUTATIVE 

IeNDOGLUCANASE TYPE K 
PRECURSOR (ENDO- 1 1 4- 
BETA-GLUCANASE) 

kCELLULASE) I 4.1 



4.0 



4.0 



4.0 



3.9 



3.1 



WO 01/02568 
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SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



191 



X95276 



IP. falciparum 
complete gene map of 
plastid-like DNA (IR- 
B) 



192 I D84487 



193 



195 



196 



197 



L 14851 



194 I M97002 



L07025 



S73149 



D86990 



Rat PMSG-induced 
ovarian mRNA, 
3 'sequence, N10 
[Rattus norvegicus 
|neurexin Ill-alpha 
gene, complete cds. 

Xenopus laevis/gilli 
hybrid pseudo-IgH 
chain gene, V region 
^neLG7G342A 

delta-endotoxin 
(CryA(a)) gene, 5' 
end. > :: 

gb|I34520|I34520 
Sequence I from 
[patent US 5596071 > 

gb|I39790|I39790 
[Sequence 1 from 
patent US 5616495 > 



487 Sequence 1 from 
patent US 5753492 



insulin-like growth 
factor II {intron 7} 
[human, Genomic, 
1702 nt] 



Human (lambda) 
DNA for 

immunoglobulin light 
chain 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



Nearest Neighbor (BlasiX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



[HYPOTHETICAL! 1.7 KD 



P VALUE 



3219951 



PROTEIN C6B12.13IN 

CHROMOSOME I 

>gi|2330843|gnl|PID|e334O47 
Ipombe] 



173164 



3323586 



2118407 



2496940 



3327038 



494367 



MHC sex-limited protein - 
|mouse (fragment) musculusl 



HYPOTHETICAL 53.4 KD 
PROTEIN D 1054. 13 IN 
CHROMOSOME V 
>gi|38753 1 6|gnl|PID|e 1 344967 



3.0 



(J02719) valyl-tRNA synthetase 
[Saccharomyces cerevisiael | 2.3 
( AF060869) si ngle-strand 
binding protein [Salmonella 
Ityphimuriuml I 2.3 



2.3 



1.8 



(AB0145.12) KIAA0612 protein 
[Homo sapiensl • | L8 

Fv Fragment (Murine Se 155-4) 
Complex With The 
Trisaccharide: Alpha-D- 
Galactose(l-2)[alpha-D- 
Abequose(l-3)]alpha- D- 
Mannose (Pl-Ome) (Part Of 
The Cell-Surface Carbohydrate 
Of Pathogenic Salmonella) | \ 1.8 



WO 01/02568 



PCT/US00/18374 



t 



|| Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






rlasmia praA (from 










198 


L 17027 


Fremyella 
diplosiphon) DNA 
sequence, including 
unidentified cds ana 
stem loop. 


1.7 


1082702 


po ho virus receptor- rei a tea 
protein - human 


1.4 


199 


AL022273 


Caenorhabditis 
elegans cosmid 
H22D14, complete 
sequence 
[Caenorhabditis 
elegans] 


,.7 


3924605 


yPil^UO^H^z.) putative lruiiDiiui 
of apoptosis [Arabidopsis 
thaliana] 


1.4 


200 


U89926 


Drosophila 
melanogastercut 
gene, partial sequence 


1.7 


2245100 


(Z97343) DNA-binding protein 
homolog 


1.3 


201 


Z25749 


H.sapiens gene for 
ribosomal protein S7 


1.7 


2493459 


PROTEIN KINASE C 

OUojl KA 1 E, OU. i 1VU 

PROTEIN, HEAVY CHAIN 
(PKCSH) (80K-H PROTEIN) 
>gi| 12 15746 


1.1 


202 


U59841 


Fundulus heteroclitus 
lactate dehydrogenase 
B 


1.7 


3005587 


(AF048977) Ser/Arg-related 
nuclear matrix protein [Homo 
sapiens] 


U.o- 


203 


X55763 


Rabbit mRNA for 
smooth muscle 
calcium channel 
blocker (CaCB) 
receptor 


1.7 


3883128 


(Ar0o23U-:) araDinogalactan- 
protein [Arabidopsis thaliana] 


0.82 


204 


Z75528 


Caenorhabditis 
elegans cosmid 
C18B 12 A, complete 
sequence 
[Caenorhabditis 
eleaans] 


1.7 


* 

940397 


(D10123) core [Hepatitis C 
virus] 


O.SO 


205 


U50912 


Human XIST gene, 
poly p urine - 
pyrimidine repeat 
resion 


1.7 


2338027 


^AFftOS*n01 tar^e te°ument 
protein [Alcelaphine herpesvirus 

11 


0.59 


206 


X12817 


Ovis aries beta- 
lactoalobulin gene 


1.7 


" 987050 


(X65335) lacZ gene product 
[unidentified cloning vector] 


0.45 


207 


AF004419 


Homo sapiens 
troponin T (TNNT2) 
sene. exon 13 


1.7 


2996364 


(AF053947) unknown [Yersinia 
pestisl >gi|38S3090 


0.22 


20S 


— -•■ ■ ■— -— - — 
L43643 


Gallus domesticus 
DNA microsatellite 
marker MCW119 


1.7 


464896 


TRANS DUC IN-LIKE 
ENHANCER PROTEIN I 
enhancer-of-split homolog TLE- 
1 - human >ai|3075IO 


o.:o | 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















209 


273278 


S.cerevisiae 
chromosome XII 
reading frame ORF 
YLR 1 06c 


1.7 


1351657 


UVU/Vl'UU PI/'AI IT! U L!l\ 

PROTEIN C30D11.04C IN 
CHROMOSOME I 
>gi|2130411|pir||S62562 
hypothetical protein 
SPAC30D 1 1 .4c - fission yeast 
nuclear pore complex protein 
ISchizosaccharomyces pombe] 


0.20 


210 


M22345 


Mouse endogenous 
pro virus gag, pol, and 
env region DNA. 


1.7 


2444455 


(AF020765) hypothetical 
protein [Myxococcus xanthus] 


0.12 


211 


AE000360 


Escherichia coli K-12 
MG1655 section 250 
of 400 of the 
complete genome 


1.7 


2736361 


(AF039038) No definition line 
found [Caenorhabditis elegans] 


0.12 


212 


AB020692 


Homo sapiens mRNA 
for KIAA0S85 
protein, complete cds 


1.7 


2605924 


(AF029726) histidine kinase C 
[Dictyostelium discoideum] 


0.094 




S69429 


testis-determining 
gene/SRY homolog 
[Sminthopsis 
macroura=striped- 
faced dunnarts, 
Genomic. 855 ntj . 


1.7 


■ 

2499016 


TONB PROTEIN >gi| 16665 36 
(U23764) TonB [Pseudomonas 
aeruginosa] 


0.092 






testis-determinins 


1.7 


2499016 


TONB PROTEIN >gi| 1666536 
(U23764) TonB [Pseudomonas 
aeruginosa] 


0.088 


1 14 


S69429 


gene/SKY homolog 
[Sminthopsis 
macroura=striped- 
faced dunnarts, 
Genomic. 855 ht] 




U67205 


Mus musculus ACF7 
neural isoform 3 
(mALr/) mKNA, 
partial cds 


1.7 


2047349 


(AF000198) weak similarity to 
HSP90 [Caenorhabditis eleoans] 


0.052 


216 


X98188 


Artificial DNA 
sequence for 
mammalian lambda- 
neo minichromosome, 
1400 bp 


1.7 


2493779 


WTATivEcuTiCLfi * 

COLLAGEN C09G5.6 
collagen; cDNA EST yk244c3.5 
comes from this gene; cDNA 
EST yk244c3.3 comes from this 
gene (Caenorhabditis elegans] 


0.042 


217 


U70139 


Mus musculus 
putative CCR4 
protein mRNA. 
partial cds 


1.7 


2252630 


(U95973) hypothetical protein 
[Arabidopsis thaliana] 


0.041 



"\ V 1 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

ID 1 ACCESSION 



222 



223 



224 



225 



226 



227 



DESCRIPTION 



218 L38808 



219 Z7215I 



Nearest Neighbor f BlastX vs. Non-Redundant Proteins) 



Homo sapiens alpha- 
1 type V collagen 
(COL5Al)gene. 5' 
flank and exon I. 



B.napus mRNA for 
AMP-binding protein 



R.norvegicus mRNA 
for cysteine sulfonate 
220 |X94152 (decarboxylase 



Mouse stathmin gene 



221 | L20255 sequence. 



Rattus norvegicus 
glycine transporter 
LI 3600 mRNA, complete cds 



P VALUE | ACCESSION 



Plasmodium berghei 
AJ224150 lEF-lalpha A-gene 



butyrophilin [mice, 
lactating mammary 
gland, mRNA Partial, 
S30642 b 193 ml 



IC.elegans unc-86 
gene encoding two 
alternative proteins, 
M22363 complete cds, 



M.musculus cgt gene 
X92123 exonl 



Ipomoea nil PKn2 
(knotted-like gene) 
AB016000 [mRNA, complete cds 



II Bovine mRNA for 

228 [ D14133 [synaptocanalin I 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



1.7 



DESCRIPTION 



P VALUE 



2895760 



190475 



2136212 



2317934 



726403 



2072290 



2695746 



2224683 



3874232 



2183083 



UAF045246) universal minicirclel 
sequence binding protein 
minicircle sequence binding 
protein fCrithidia fasciculatal 



0.039 



(K02576) salivary proline-rich 
protein 1 [Homo sapiens] 



0.011 



synapsin lib - human 
>gi|l594277 (U40215) synapsin 
[lib [Homo sapiens] 



l(U97553) unknown [murine 
herpesvirus 681 



(U23 175) similar to anion 
exchange protein 
[Caenorhabditis elepans] 



(U95094) XL-INCENP 
|[Xenopus laevis] 



(AJ223010) Pmt2 
[Schizosaccharomyces pombe] 



(AB002369) KIAA0371 [Homo 
sapiens] 



(Z49909) similar to Prokaryotic 
ribonuclease PH 
[Caenorhabditis elegans] 



392527.7 



(AF000422) TTF-1 interacting 
eptide 5 [Homo sapiens] 



j) similar to 
Uncharacterized protein family 
UPF0034, Double-stranded 
RNA binding motif; cDNA EST 
yk489b3.5 comes from this 
gene; cDNA EST yk439g7.5 
comes from this gene 
Caenorhabditis elegans] 



0.008 



0.006 



0.003 



0.001 



9e-04 



le-04 



3e-05 



le-05 



2e-06 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Gcnbarik) 



ACCESSION 



DESCRIPTION 



229 L0I991 



230 I X630I6 



231 Z22802 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



Mus musculus TAFG- 
1-like neuronal 
glycoprotein (PCS) 

mRNA, complete cd s. 
Tomato yellow leaf 

curl virus Thailand 

isolate complete 

genome (TYLCV-TH 

B-DNA) 



H.sapiens 

microsatellite repeat. 

> :: 

gb|G34562|G34562 
human STS SHGC- 
51834 



DESCRIPTION 



P VALUE 



3006139 



|(AL022299) hypothetical 
>rotein 



1.7 



3643608 



(AC005395) hypothetical 
I protein [Arabidopsis thaliana] 



1.7 



I002I0 



extensin precursor (clone Tom L 
|4) - tomato esculentum] 



4e-07 



le-07 



232 I K02765 



Human complement 
component C3 
mRNA, alpha and 
beta subunits, 
complete cds. 



2984320 



|(AE000773) acetoin utilization 
)rotein [Aquifex aeolicusl 



233 



Z74818 



S.cerevisiae 
chromoso me XV 
eading frame Ol 
YOL076w 



(Z/ilUi) predicted using 
Genefinder; Similarity to 
Bacillus subtilis DNAJ protein 
gene; cDNA EST 
EMBL;C 12520 comes from this 
gene; cDNA EST 



3873700 



Pis mRNA for thimet 



234 



D21871 



1.7 



2632098 



7e-ll 



8e-13 



235 



Y 14344 



Gallus gallus gene 
encoding neurofascin, 
xons 9,10.11 & 12 



1.7 



3876421 



EMBL:C12730 comes from this 
gene; cDNA EST yk200b6.5 
comes from this gene; cDNA 
EST yk349al2.5 comes from 
this gene [Caenorhabditis 
elesans] 



3e-14 



236 Z73608 



cerevisiae 
iromosome XVI 
eading frame ORF 
PL252c 



1.7 



1439663 



(U64605) C05D9.6 gene 
product [Caenorhabditis 
|elegans] 



6e-18 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


J ACCESSION 


DESCRIPTION 


P VALUE 










OLIGOSaCCHa&YL 




237 


AG000518 


Homo sapiens 
genomic DNA, 21q 
region, clone: 
T171N23 


1.7 


1 174468 


TRANSFERASE STT3 
SUBUNIT HOMOLOG 
>gi|529357 (U13019) No 
definition line found 
[Caenorhabditis elegans] 


6e-lS 


238 


D17716 


Human mRNA for N- 
acetylglucosaminyltra 
nsferase V, complete 
cds 


1.7 


961446 


(D63877) KIAA0157 gene 
product is novel. 


5e-19 


239 


AF102512 


L'heilodactylus 
vmatus country uoa, 
Midway Island 
cytochrome c oxidase 
subunit I gene, 
mitochondrial gene 
encoding 
mitochondrial 
protein, partial cds 


1.7 


1572756 


/nHf\QAQ\ A~L 1 nana 

(U /Uo4a) uhjoz. i gene 
product [Caenorhabditis 
elegans] 


5e-40 

,. . , 


240 


L30107 


Rattus norvegicus 
liver- specific 
transporter gene, 
promoter region. 


„ 


4176443 


(AL022238) dJ1042K10.4 
(novel protein) 


3e-4V | 


241 


X91220 


H.sapiens mRNA for 
Na-CI electro neutral 

1**1 * 

thiazide- sensitive 
cotransporter 




1.7 


347S637 


(AC00jD4o) R2v42j_1 [Homo 
sapiens] 


6e-54 


242 


U97146 


Rattus norvegicus 
calcium-independent 
phospholipase A2 
mRNA, complete cds 


1.6 


<NONE> 


<NONE> 


<NONE> 


243 


Z48508 


rea seeu uornc 
mosaic virus RNA for 
coat protein and 
polymerase (partial) 


1.6 


<NONE> 


<NONE> 


<NONE> 


244 


M1S349 


Rat leukocyte 
common antigen (L- 
CA) gene, exons 1 
throuiih 5. 


1.6 


<NONE> 


<NONE> 


<NONE> j 


245 


Ml 31 58 


Yeast (S.pombe) 
cdc25+ gene (mitosis 
initiation), complete 
cds. 


1.6 1 


<NONE> 


<NONE> 


\ 

<NONE> 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



246 



U39712 



M 17922 
M89986 
M89986 



250 



U68725 



251 



X95199 



253 



214078 



254 



AB002314 



255 



M21488 



P VALUE 



Nearest Neighbor (BlasiX vs. Non-Redundant Proteins; 



ACCESSION 



Mycoplasma 



genital ium section 34 
of 51 of the complete 
genome 



Mouse Murine 
urokinase- type 
plasminogen activate 
protein gene, 
complete cds. 
Human polymorphic 

loci in Xq28. 

Human polymorphic 
loci in Xq28. 



Rattus norvegicus 
Deleted in colorectal 
Cancer 



P.platessa GSTA, 
GSTA1, GSTA2, and 
PPTN senes 



D.melanosaster 



r.aestivum 
mitochondrion fMet, 
18S, 5S repeat unit 
DNA 



Human mRNA for 
KIAA0316gene, 
complete cds 



-luman muscle 
creatine kinase gene 
CKMM). exon 2. 



DESCRIPTION 



1.6 



<NONE> 



3875750 
3261710 
2143805 



1.6 



1256804 



<NONE> 



0!l4<)!>) predicted us in 2 
Genefinder; cDNA EST 
yk410e3.3 comes from this 
gene; cDNA EST yk410e3.5 
comes from this gene 

[Caenorhabditis elegansl 

(Z84724) psd [Mycobacterium 

tuberculosis] 

inositol- polyphosphate 4- 
phosphatase - rat 



(U51449)RING3 protein 
[Xenopus laevis] 



1.6 



3915113 



YPOTHEJICaLSi Kt) 
PROTEIN IN COB INTRON 
>gi|2654230|gnl|PID|el 192341 
(X02819) unidentified reading 
frame [Schizosaccharomyces 
pom he] 



2501668 



1.6 



130997 



119399 



MALEYLACETATE 
REDUCTASE Pseudomonas 
cepacia >gi|643636 (U19883) 
maleylacetate reductase 
[Burkholderia cepacia] 



DYSTROPHIN-RELATED 
PROTEIN 2 sapiens] 



P VALUE 



REPETITIVE PROLINE-RICH 
CELL WALL PROTEIN 1 
PRECURSOR 

>gi|8!S09|pir||A29324 proline- 
rich protein precursor - soybean 
>gi|170049 (J02746) proline- 
ich protein [Glycine max] 



ENV POLYPROTE1N 
PRECURSOR (COAT 
p OLYPROTEIN) [CONTAINS: 
COAT PROTEIN GP62; COAT 
PROTEIN GP40] 



<NONE> 



5.8 



4.9 



4.8 



3.6 



2.8 



2.2 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


| Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


■ % 

i 

ACCESSION 


DESCRIPTION 


P VALUE 
















256 


AE001 164 


Borrelia burgdorferi 
(section 50 of 70) of 
the complete genome 


1.6 


4050089 


(AF109907) hypothetical 
protein [Homo sapiens 1 


_ 

• 

1.5 


257 


X61757 


M.musculus 
rearranged T-cell 
receptor beta variable 
region (Vbl7a) 


1.6 


3377766 


(AF080090) semaphorin IV 
isoform b [Mus musculus] 


1.2 


zOiS 


M 15346 


T.cruzi tandemiy 
repeated gene 
encoding an 85 kDa 
antigen with 
homology to heat 
shock proteins. 


1.6 


2804437 


(AF043695) similar to zinc 
metal lopro tease family of 
peptidases [Caenorhabditis 
elegansl 


0.41 


25V 


L39018 


Rattus norvegicus 
sodium channel 
protein 6 (SCP6) 
mRNA, complete cds 


1.6 


2920535 


(AF018081) type XVIH 
collagen [Homo sapiens] 


0.037 


26U 


M29483 


Human leukocyte 
adhesion protein 
p 150.95 alpha subumt 
gene, exons 7-15. 


1.6 


1840045 


(U49082) transporter protein 
[Homo sapiens] 


2e-09 


Ml 


L06844 


Aspergillus niger beta 
D-fructofuranosidase 
(sucl) gene, one 
exon. 


1.6 


4206210 


(AF071527) putative calcium 
channel [ Arabidopsis thaliana] 


9e-10 


262 


Ml 0946 


Chicken aldolase B 
gene, complete cds, 
clones lambda- 
C(l 1.1.4). 




1.0 


2746/75 


(AF040640) similar to peptidase 
family C19 (ubiquitin carboxyl- 
terminal hydrolase) 
[Caenorhabditis elegans] 


le- j i 


263 


X07881 


Human gene PRB3L 
for pro line -rich 
protein Gl 


L.5 


<NONE> 


<NONE> 


<NONE> 


264 


U22260 


Nicotiana tabacum 
UMP synthase (pyr5- 
6) mRNA, partial cds 


1.5 


3880923 


(Z99271) similar to Reverse 
transcriptase comes from this 
gene [Caenorhabditis elegans] 


0.50 


265 


U76759 


Mus musculus 
nuclear protein 
NIP45 mRNA. 
complete cds 


1.4 


1330394 


(U58761) C01F1.6 gene product 
[Caenorhabditis elegans] 


8.9 




WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neishbor (BlastX vs. Non-Redundant Proteins) 


SEO 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












HJ 1 ASMUM- 




266 


AF076470 


• 

Rice tungro 
bacilliform virus 
Serdang strain, 
complete genome 


1.4 


1703461 


TRANMWJRTINU AiPASE " 
BETA CHAIN (PROTON 
PUMP) (GASTRIC H+/K+ 
ATPASE BETA SUB UNIT) 
3.6. 1 .36) beta chain - human 
>gi|I84105 (M75110) H,K- 
ATPase beta subunit [Homo 
sapiens] 


8.9 


267 


X64659 


Cjacchus interferon 
gene for interferon 
gamma 


1.4 


' 1486485 


(U28832) USl0[Gallid 
herpesvirus 1] >gi| 1486497 


6.8 


268 


U 1 1825 


Schistosoma 
japonicum structural 
muscle protein 
paramyosin mRNA, 
complete cds. 


0.88 


<NONE> 


<NONE> 


<NONE> 


269 


D84278 


Human DNA for 
CD38, exon 1 


0.68 


3766363 


(AL03I907) hypothetical serine 
rich protein 

[Schizosaccharomyces pombe] 


3.0 


270 


M59755 


Bovine lens aldose 
reductase 

pseudogene, 3" end. 


0.67 


<NONE> 


<NONE> 


<NONE> 






Homo sapiens 
skeletal muscle 
voltage-dependent 










2/1 


M81758 


sodium channel alpha 
subunit(SkMl) 
mRNA. complete cds. 


0.65 


2437819 


(Z86105) M-beta-glucanase 
[Anaerocellum thermophilum] 


3.6 


272 


L01965 


Human type IV 
sodium channel alpha 
polypeptide 


0.64 


2437819 


(Z86105) 1,4-beta-glucanase 
[Anaerocellum thermophilum] 


3.5 


273 


U90122 


Danio rerio bone 
morphogenetic 
protein-4 (bmp4) 
mRNA, partial cds 


0.63 


2983532 


(AE000720) formate 
dehydrogenase alpha subunit 
Aquifex aeolicus] 


7.9 


274 


L41624 


Hylobates lar mucin 
(MUCl) gene, exons 
1-6. 


0.63 


1517808 


(D79215) FGF-10[Ratius 
norvesicus] 


0.91 



■5^9 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



275 AF030881 



276 



277 



Fugu rubripes sushi 
retrotransposon gag 
polyprotein (gag) and 
pol polyprotein (pol) 
genes, complete cds 



U52909 



[Arabidopsis thaliana 
Ul snRNP70K 
protein gene, 
[complete cds 



Homo sapiens 
putative GR6 protein 
(GR6) mRNA, 
AF0O8I92 complete cds 



278 



279 



[Human fatty acid 
binding protein 
(FABP3) gene, 
U17Q81 complete cds 



280 



Homo sapiens mRNA 
for KIAA0797 
AB018340 protein, partial cds 



Y00093 



H.sapiens mRNA for 
| leukocyte adhesion 
glycoprotein pi 50,95 



281 



282 



283 



Human cathepsin D 
(catD) gene, exons 7, 
M63138 8, and 9. 



N. sylvestris DNA for 
spacer region 
between 25S and 18S 
X76056 ribosomal RNA aenes 



B.taurus mRNA for 
X74501 lACTH receptor 



284 



RatFl-ATPase beta 
subunit mRNA, 3' 
M57634 end. 



0.63 



0.62 



0.62 



0.62 



0.62 



0.62 



0.62 



0.62 



0.62 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 1 



ACCESSION 



DESCRIPTION 



1519696 



(U0/y3o> coded tor by C 



elegans cDNA ykl J M).o; coded 
for by C. elegans cDNA 
ykl59h6.3; coded for by C. 
elegans cDNA ykl26f9.3; coded 
for by C. elegans cDNA 
ykl59h6.5 [Caenorhabditis 
elegans] 



P VALUE 



<NONE> 



<NONE> 



3800934 



(AF100655) contains similarity 
to ser/thr protein kinases 
[[Caenorhabditis elegansl 



0.38 



<NONE> 



(AF049709) tyrosyl protein 
3617848 Isulfotransferase-A; TPST-A 



424044 



VP5 protein - porcine rotavirus 
>2i|61355 



1054945 



(U38621) polyprotein (Tobacco 
vein mottlina virus] 

1 W * 



136810 



GLYCOPROTEIN M 
>gi|73791|pir||WMBE51 UL10 
protein - human herpesvirus 1 1- 
473) [Human herpesvirus 1] 
l >gi|221732|gnl|PrD|d 1002 131 



2661176 



(U76671) putative cds 
[Rhodobacter sphaeroides] 



4249552 



(AB001075) galectin-2 related 
protein 



2119692 



transforming growth factor- beta 
type HI receptor - chicken 
>gi|5 11843 (L01121) 
transforming growth factor- beta 
[type HI receptor [Gallus eallus] 



9.7 



7.7 



7.7 



4.5 



3.5 



2.0 



2.0 



LI 



WO 01/02568 PCT/USOO/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Prote 



ACCESSION 



ins) 



DESCRIPTION 



285 



Homo sapiens 
SERCA3 gene, exons 
Y 15724 1-7 (andjoined CDS) I 0.62 



286 



'lasmodium 
I falciparum DNA *** 
SEQUENCING IN 
PROGRESS *** 
from contig 3-72, 
AL010I42 complete sequence 



2498164 



287 



Mus musculus Stat3 
gene, 5-flanking 
region and exon I 
AB008160 loartial sequence 



288 I ABO 18795 



289 Z69906 



Halomonas marina 

gene for alginate 

I l yase, comple te cds 
luman DNA 

[sequence from 

cosmid E141E2, on 

chromosome 22, 

complete sequence 

[Homo sapiens] 



290 U 18259 



Human clone CIITA- 
SMHCclass II 
transactivator CI IT A 
[mRNA, complete cds 



291 X98890 



S. tuberosum mRNA 
for inorganic 
phosphate 
transporter. StPTl 



0.62 



0.62 



0.62 



0.61 



0.61 



0.61 



3183206 



466097 



3877493 



<NONE> 



1483567 



475724 



ASFAKI VUASPAKAO iNYL " 

(ASPARTATE BETA- 
HYDROXYLASE) (ASP BETA] 
HYDROXYLASE) (PEPTIDE- 
ASPARTATE BETA- 
DIOXYGENASE) beta- 
dioxygenase (EC 1.14.11.16) - 
bovine >gi| 162694 taurus] 



P VALUE 



0.52 



HYPOTHETICAL PROTEIN 
KIAA0009 sapiens] 

PROTEIN ZK353.1 IN 
CHROMOSOME III 
>gi|1078903|pir||S44654 
ZK353. 1 protein - 

Caenorhabditis elegans 

>gi|289757(L15313) putative 

Caenorhabditis elegans] 
l&j) similar to Aliases 

associated with various cellular 
activities (AAA); cDNA EST 
EMBL:Z 14623 comes from this 
gene; cDNA EST 
EMBL:D7509U comes trom this" 
gene; cDNA EST 
EMBL:D72255 comes from this 
ene;cDNA EST yk200e4... 



4e-07 



le-35 



3e-46 



<NONE> 



<NONE> 



'X79983) viral proteinase 
Pseudorabies virus] 



9.8 



(U0SS84) protein VIII precursor, 
Bovine adenovirus type 3] 



7.6 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



292 



Nearest Neighbor f BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) I 



ACCESSION 



Rattus norvegicus 5- 
pxo-L-prolinase 
U70825 |mRNA. complete cds 



293 



Homo sapiens 
(subclone 2_a9 from 
PI H49) DNA 
L81667 [sequence 



294 



lAquifex aeolicus 
section 92 of 109 of 
AE 000 7 60 the complete genome 



295 



U58512 



Mus musculus Rho- 
associated, coiled- 
coil forming protein 
kinase pi 60 ROCK-i 
mRNA, complete cds 



296 



U27459 



Human origin, 
recognition complex 
protein 2 homolog 
hORC2L mRNA, 
[complete cds 



297 



L36680 



Pisum sativum S- 
adenosylmethionine 
synthase mRNA, 3' 
end. 



298 



Aquifex aeolicus 
sections of 109 of 
AE000673 the complete aenome 



299 



Homo sapiens full 
length insert cDNA 
AF086310 clone ZD51F08 



300 



AJ009675 



Agrotis ipsilon 
mRNA for 3-hydroxy- 
3-methylglutaryl 
coenzyme A 
reductase 



DESCRIPTION 



0.61 



301 



[Homo sapiens 
chromosome 19, 
cosmid F18382B. 
centromeric end, 
complete sequence 
AC005577 ([Homo sapiensl 



0.61 



0.61 



0.61 



0.61 



0.61 



0.61 



0.61 



0.61 



0.60 



733543 



2565087 



2811092 



295671 



200285 



2285790 



3395782 



3646450 



P VALUE 



(U23448) similar to genome 
polyprotein 

(SP:POLG_B VD VN, P 1 97 U ) ; 
alternative splicing to C04A2.7a 



(U80759) CAGH4 alternate 
open reading frame [Homo 
sapiensl 



4176370 



<NONE> 



HOMEOBOX PROTEIN HOX- 
A3 (HOX-I.5) homeobox- 
containing transcription factor 
Mus musculus] 



(LI 1275) selected as a weak 
suppressor of a mutant of the 
subunit AC40of DNA 
dependant RNA polymerase I 
and III 



4.4 



(M97900) putative open reading 
1 frame [Mus musculus] 



(AB0020S6) p47 [Rattus 
norveaicus] 



(AF058446) histone 
macroH2A1.2 [Gallus gallusl 



(AL031603) conserved 
hypothetical protein. 
[Schizosaccharomyces pom be] 



(AC005058) similar to calcium- 
independent phospholipase A2; 
similar to AC004392 
(PID:g3367519) [Homo 
sapiens] 



<NONE> 



3.3 



2.6 



1.5 



0.66 



4e-I2 



6e-27 



8e-29 



2e-73 



<NONE> 1 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



304 



310 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



302 I U40454 



Candida albicans 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redunrlmt Prote 



ACCESSION 



303 I JO 1 390 



topoisomerase type I 
(C ATOP I) gene, 
complete cds | 0.60 

Emericella nidulans 
mtDNA between 
h2/h5 and bh2to2 
junctions, genes for 
ATPase subunit 6, 
cytochrome oxidase 
subunit 3, seven, 
unidentified proteins, 
twentyfour tRNA's 
and L-rRNA. I 0.60 



Plasmodium 
falciparum RNA 
polymerase I gene, 
LI 1172 I complete cds 



Caenorhabditis 
elegans cos mid 
F39H11, complete 
sequence 
[Caenorhabditis 

305 I ZS1079 elegans] 

S.cerevisiae 

reading frame ORF 

306 I Z49627 YJR127c 

IRattus norvegicus H 
K-ATPase alpha 2 
gene, alternatively 
[spliced products and 

307 I U94911 partial cds 

|Methanococcus 
jannaschii section 18 

Jof 150 of the 

308 I U67476 [complete genom e 

IMethanococcus 
jannaschii section 55 
of 150 of the 



0.60 



0.60 



0.60 



0.60 



309 1 U67513 complete genome 



Haemophilus ducreyi 
lipoprotein gene, 
U57817 |complere cds 



0.60 



0.60 



0.60 



ins) 



DESCRIPTION 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



118751 



<NONE> 
MAJOR DNA-BlNDi; 
PROTEIN herpesvirus 1 (strain 

ssDNA-binding protein 
[Saimiriine herpesvirus 2] 



2213862 



1749688 



3327421 



4008577 



(AF003086) PfSNF2L 
(Plasmodium falciparum] 



<NONE> 



<NONE> 



<NONE> 



9.6 



(D89240) unnamed protein 
product 



(U97068) zonadhesin [Mus 
rnusculus] 



(AL034491) conserved 
lypothetical protein 
Schizosaccharomyces pombe] 



7.4 



5.7 



2.5 



i 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



311 



Nearest Neighbor (BlaslN vs. Genbank) I Nearest Neighbor (BlastX v S . Non-Redundant Protein.? 



ACCESSION 



DESCRIPTION 



312 



X80700 



L42167 



H.sapiens G17 gene 



313 



Mus musculus (clone 
R24) rds gene, partial 
cds 



Human hMSH6 
U54777 ImRNA. complete cds 



'Human mRNA for 
JKIAA0232 gene, 
314 1 D86985 complete cds 



315 



316_ 



317 



318 



Rat liver mRNA for 
. D43964 Kan-1, complete cds 
D IRattus norvegicus 

CTD-binding SR-like 
(protein rA4 mRNA. 
U49058 partial cds 



U.ruddi 

mitochondrial 12S 
X84388 ribosomal RNA 



AF 125447 



319 



320 



U20189 



M63962 



321 I AJ1 32366 



Caenorhabditis 
elegans cosmid 
Y14H12B 



Hyoscyamus muticus 
clone cVS2 
vetispiradiene 
synthase mRNA, 
partial cds. 



Human gastric H,K- 
ATPase catalytic 
subunit gene, 
complete cds. 



ACCESSION 



Helicobacter pylori 
(strain PI) comB and 
pmi/algA (partial) 
genes, and partial 
ORFl andORF2 




DESCRIPTION 



probable protein- tyrosine kinase 



P VALUE 



(EC 2.7.1.112) RTK - Pacific 
electric ray >gi 129085 8 



1.5 



(AF033823) moira [Drosophila 
melanogaster] 



0.51 



(AF031087) mismatch repair 
protein MSH6 [Mus musculus] | 5e-07 



(U97006) No definition line 

found [Caenorhabditis elegans 1 2e-07 
(U^3J76) coded tor by C.' 

elegans cDNA cm21e6; coded 

for by C. elegans cDNA 

cm01e2; similar to melibiose 

carrier protein 

(thiomethylgalactoside permease! 
«) I 5e-I5 



(U37500) RNA polymerase II 
largest subunit [Mus musculus] | le-19 



(Z70205) predicted using 
Gene finder 



2e-37 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 
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SEQ 
ID 



322 



323 



324 



325 



326 



329 



Nearest Neighbor f BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



[us muse ul us 
Itranscription factor 
AP-2 (AP-2) gene, 
alternative exon la. 
and isoform 2, partial 
U 17289 cds. 



331 



332 



0.59 



S.cerevisiae 
chromosome XIV 
reading frame ORE 
Z71466 YNLl90w 



0.59 



Beet soil-borne virus 
genes for 13K, 22K 
Z66493 and 48K proteins 



0.59 



Homo sapiens 
prostasin mRNA, 
1-4 1 35 1 complete cds 



0.59 



S.Iincolnensis gene 
for 16S ribosomal 
X79854 RNA 



0.59 



Strongylocentrotus 
purpuratus rnRNA for| 
327 | AJ223356 SuDp98 protein 



0.59 



H.sapiens mRNA for 
328 I X86019 PRPL-2 protein 



0.59 



Xiphias gladius 
U75528 partial cds 



0.59 



330 AC005573 



L19180 



L32090 



Homo sapiens 
chromosome 5, PAC 
clone 202el3 



Rat receptor-linked 
protein tyrosine 
phosphatase 



Listeria 

monocytogenes sec A 
gene, complete cds. 



0.59 



0.59 



0.59 



DESCRIPTION 



2459419 



3875542 



2119867 



729212 



3702828 



2495704 



1743341 



P VALUE 



(AC002332) hypothetical 
protein [Arabidopsis thalianal 



(Z67990) Similarity to Rat 
amiloride-sensitive sodium 
channel beta-subunit 



cryV465 protein - Bacillus 
huringiensis thuringiensis] 



CRYST ALLEN J1C crystallin 
Tripedalia cystophpra] 



(AF056577) high mobility 
roup protein 1.2 



HYPOTHETICAL PROTEIN 
KIAA0129 product is novel. 
Homo sapiens] 



T 10027) transcription factor 
TEF-1 [Mus musculus] 



1845995 



(U69477) envelope glycoprotein 

virusl 

tvpe 1 1 

"FT 



2506366 



UNA HUL V iVILKASL 

EPSILON SUBUNIT B DNA- 
directed DNA polymerase (EC 
2.7.7.7) II chain B - yeast . 
(Saccharomyces cerevisiae) 
>gij7863l9 (U25842) DNA 
Polymerase epsilon, subunit B 
(Swiss Prot. accession number 
P24482) [Saccharomyces 
cerevisiae | 



1235974 



2291129 



(X96713) collagen [Globodera 
pallida) 



(AF01 64 15) No definition line 
found [Caenorhabditis cleeans] 



9.4 



7.3 



7 7 



4.2 



3.2 



2.D 



2.5 



2.4 



1.4 



1.1 



0.83 



1> 



H)5 
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Nearest Neighbor fBlastN vs. Genbank) 


1 Nearest Neighbor (BlastX vs. Non-Redundant Pmreino 1 


SEQ 

ID !aCCESSIO> 


1 DESCRIPTION 


P VALUE 


1 ACCESSION 


DESCRIPTION 


P VALUE 




Xenopus laevis 






(AL031124) hypothetical 




1 111 


J U 2 44 3 3 


syndecan-2 mRNA, 
complete cds. 


0.59 


3355692 


protein SClC2.25c 
[Streptomyces coelicolor] 


0.64 


334 


M23412 


Drosophila 
muscarinic . 
acetylcholine recepto 
mRNA, complete cds 


r 

0.59 


168237 


(M76546) hydroxyproline-rich 
protein [Helianthus annuus] 


0.22 


335 


AF060729 


Synaphea media 
chloroplast atpB-rbcL 
intergenic spacer 
region, partial 
sequence 


0.59 


■ 

731596 


H YPU1 RbML'AL 

PROTEIN IN PRPS4-STE20 
INTERGENIC REGION 
>gi|626567|pir||S46825 
hypothetical protein YHLOlOc - 
yeast (Saccharomyces 
ccrevisiae) >gi|2289881 
(U11582) No definition line 
found rSaccharomvcfr? 
cerevisiae] 


0.16 


336 


AF029734 


Xanthobacter 
autotrophics 
transcriptional 
activator AldR (aldR) 
gene, partial cds; and 
NAD- dependent 
chioroacetaldehyde 
dehydrogenase (aldB) 
gene, complete cds 


0.59 


2498801 


PERIAXIN 

>2i|2143901lDirllI58157 Deriaxin 
- rat >gi|505297 (229649) 
periaxin [Rattus norveaicus] 


0.13 


337 


X95307 ] 


r.reinhardtii LI8l8r- 
l gene 


0.59 


■ 

< 

1723781 


H Y PU 1 Kb 1 1L AL J^JTCD" 
PROTEIN IN TAF145-YOR1 
INTERGENIC REGION 
>gi|2131717|pir||S64612 ■ 
hypothetical protein YGR277c - 
/east (Saccharomyces 
:erevisiae) 

>gi|1323505|gnl|PID|e243248 
773062) ORF YGR277c 
Saccharomyces cerevisiae] 


Ie-04 


338 1 


c 

M24572 


Dictyostelium 
liscoideum tRNA- 
3lu-GAA gene, clone 
'GluGAAS. 


0.59 


I 

1176186 I 


HYPOTHETICAL 43.3 KD 
3TP-BINDING PROTEIN IN 
DACB-RPMA INTERGENIC 
REGION >gi|60612Icolil 


3e-06 


339 J 


I 

U73733 |e 


Juman hMSH6 sene. 
xon 2 


0.59 


( 

266^637 | 


AF0310S7) mismatch repair 
>rotein MSH6 [Mus musculus] 


5e-07 



WO 01/02568 
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SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



[Escherichia coli 



P VALUE 



genomic DNA. (25.2 
340 I _ D90747 25.6 min) 



341 



343 



[Human desmoplakin 
J0521I ImRNA, 3' end. 



Loligo pealii kinesin 
light chain mRNA, 
342 J L2444 1 complete cds. 



Human cardiac alpha- 
rnyosin heavy chain 
(MYH6) gene, exons 
M25140 |2, 3 and 4. 



I Homo sapiens 
(subclone 9_h2 from 
PI H21)DNA 
344 [ L8I932 sequence 



I Homo sapiens full 
length insert cDNA 

345 | AFQ87966 clone YU51GQ4 

IH.sapiens flow-sorted 
chromosome 6 TaqI 
I fragment, 

346 | 278574 SC6pA10Gll 



0.59 



0.59 



0.59 



0.58 



0.58 



0.58 



IBIattella germanica 
allatostatin 



0.58 



348 I AP0I5592 



precursor, gene, 
347 | AF068061 complete cds 

Homo sapiens Cdc7 
(CDC7) mRNA. 
complete cds 

Methanosarcina 
barkeri atp operon: 
ATP synthase beta 
subunit (atpD), ATP 
synthase epsilon 
subunit (atpC), ATP 
synthase gene 1 
(atpl). ATP synthase 
a subunit subunit (... 



349 I AF028006 



350 AB017032 



Mus muse ul us gene 
or pancreatic trypsin, 
complete cds 



0.58 



0.58 



0.58 



0.58 



Nearest Neighbor (BlastX vs. Non-RedunHnm Pm^nc > 



ACCESSION 



DESCRIPTION 



P VALUE 



134286 pOLlCHOL KINASE 



246796 



major centromere protein. 
CENP-B [human, Peptide, 594 
laa] 



547800 



IKJNESEN LIGHT CHAIN 
|(KLC) sea urchin 
(Strongylocentrotus purpuratus) 
>gi|161530 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NON0> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



3184291 



3170561 



(AF056704) synapsin Ilia 
[Rattus norvegicus] 



6e-08 



4e-08 



5e-14 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(AC004136) putative DNA 
polymerase III gamma subunit | 9.4 



9.2 



WO 01/02568 



PCT/US00/18374 



| Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) [ 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Dictyostelium 




■ 






35 L 


AF081585 


discoideum 
developmental 
protein DG1I10 
(DG1110) gene, 
partial cds 


0.58 


105417 


basic proline-rich peptide IB-8a 
human 


9.2 


352 


AF086322 


Homo sapiens full 
length insert cDNA 
clone ZD53E01 


0.58" 


93026 


hypothetical protein - African 
swine fever virus (strain Malawi 
Lil-20/1) >gi|450758 (X7 1982) 
myeloid differentiation antigen 
homologue [African swine fever 
virus] >gi|903686 (M95672) 
unknown protein 


7.1 


353 


AF088025 


Homo sapiens full 
length insert cDNA 
clone ZC19C04 


0.58 


2384644 


(U92805) thrombospondin-3 
[Xenopus laevis] 


7.0 


354 


AB002339 


Human raRNA for 
KIAA0341 gene, 
partial cds 


0.58 


2135587 


Ml 30 antigen (cytosolic variant 
2) - human 


5.4 


355 


U67548 


Methanococcus 
jannaschii section 90 
of 150 of the 
complete genome 


0.58 


2911094 


(AL021957) hypothetical 
protein Rv2174 


4.2 


356 


L07868 


Homo sapiens 
receptor tyrosine 
kinase (ERBB4) 
gene, complete cds. 


0.58 


461922 


PYRUVAlk 

DECARBOXYLASE (8-10 NM 
CYTOPLASMIC FILAMENT- 
ASSOCIATED PROTEIN) 
(P59NC) 4.1.1.1) - Neurospora 
crassa >gi|293948 (L09125) 
pyruvate decarboxylase 
(Neurospora crassa] 
>°i|l655909, 


4 2 


357 


■ 

X03S97 


Bacillus subtilis 
Sigma 43 operon with 
P23-dnaE-rpoD genes 
(dnaE for DNA 
Drimase, rpoD for 
RNA polymerase) 


0.58 


1323704 


(U55387) similar to C. elegans 
F38E1.9 gene product encoded 
by GenBank Accession Number 
U41996 [Cricetulus griseus] 


4.1 


358 


D76419 


Desulfovibrio 
vulgaris rbo gene for 
desulfoferrodoxin and 
rub gene for 
rubredoxin, complete 
cds 


0.58 


3420047 


(AC004680) putative protein 
cinase [Arabidopsis thaliana] 


\ 

2.U 
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Nearest Neighbor (BlastN vs. Gcnbank) 


Nearest Neifihb 


or(BlastX vs. Non-Redundant Proteins) 


ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Human DNA 










359 


282174 


sequence from 
cosmid B20F6 on 
chromosome 22, 
complete sequence 
[Homo sapiens] 


0.58 


2145455 


(Y07866) catalase- peroxidase 


2.4 


360 


M33642 


F.solani STI35 
protein gene, 
complete cds. 


0.58 


2896706 


(AL021897) hypothetical 
protein Rv 1069c 


2.4 


361 


U64873 


Mus musculus 
transforming growth 
factor alpha (TGF 
alpha) gene, partial 
cds 


0.58 


• • 

3874437 


(Z81038) predicted using 
Gene finder; cDNA EST 
yk488a2.5 comes from this gene 
[Caenorhabditis elegans] 


1.8 


362 


AB002132 


Macrophthalmus 
banzai mitochondrial 
DNA for 12S and 
16S rRNA, partial 
and complete 
sequence 


0.58 


2960022 


(AJ224676) rho type GEF 
[Drosophila melanogaster] 


1.8 


363 


AF070070 


Caenorhabditis 
elegans MutS 
homolog (msh-5) 
mRNA, partial cds 


0.58 


4098205 


(U75S69) Omp22 [Helicobacter 
pylori] 


1.8 


364 


AF045240 


Staphylococcus 
epidermidis plasmid 
pIP1629 mobilization 
protein (mobCl), 
(orf69-l\(mobAl), 


0.58 


4218117 




— 062 


365 


X61637 


H.sapiens Wilms 
tumor gene L. exons 8 
and 9 


0.58 


2331059 


(AL033353) piuieiu (fiagment)- 

(U88211) unknown [Gallus 
gallus] 


0.62 


366 


AF0393 12 


* 

Moraxella catarrhalis 
strain 4223 transferrin 
binding protein A 
(tbpA) and transferrin 
binding protein B 
(tbpB) genes, 
complete cds; and 
unknown sene 


0.58 


120155 


FIBER PROTEIN 
>gi|74229|pir||ERADFM fiber 
protein - mouse adenovirus 1 
>gi|209758 (M30594) fiber 
protein [Mastadenovirus mus I] 


0.27 


367 


D87463 


Human mRNA for 
KlAA0273gene, 
complete cds 


0.58 


3861477 


(U94177) androgen receptor 
[Pan troglodytes] 


0.12 


368 


U40342 


Mus musculus ninein 
mRNA. complete cds. 


0.58 


4115936 


(AF 1 18223) No definition line 
found [Arabidopsis thalianal 


0.004 
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Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

ID [ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



369 



S57235 



370 



U39391 



371 



D00056 



CD68=110kda 
transmembrane 
glycoprotein [human, 
promonocyte cell line 
U937, mRNA, 1722 
ml 



Mus musculus 
serotoninlA receptor 
mRNA, complete cds. 



0.58 



0.58 



Monkey B- 
lymphotropic 
)apova virus genes for| 
VP-K 2, 3 and large 
T antigen, complete 
and partial cds, strain 
LPV-76>:: 
b|M14494|PPMVPl 
vl Monkey B- 
ymphotropic 
papovavirus mutant 
(LPV-76) PstI B 
ragment encoding 
VP1, VP2, VP3 and 
antiaen. 



2072501 



1469876 



(U96113) WWP1 [Homo 
sapiens] 



(D6348I) The KIAA0147 gene 
product is related to adenylyl 
cyclase. [Homo sapiens] 



0.58 



2462069 



(AJ001774) vanadium 
chloroperoxidase 



rope 

HVFLTlHbllLAL4J.8 KB 
PROTEIN IN NCE3-HHT2 
INTERGENIC REGION 
>gi|2131871|pir||S62957 
hypothetical protein YNL035c 
yeast (Saccharomyces 
cerevisiae) 

>gi|1301880|gnl|PID|e239670 
(Z71311)ORF YNL035c 
[Saccharomyces cerevisiae] 



le-04 



le-07 



le-08 



372 



373 



M77182 



S72579 



374 



AF018165 



Amsacta 
entomopoxvirus 
spheroidin gene, 
complete cds, and 
four vaccinia related 
orfs. > :: 
b|I16670|I16670 
Sequence 1 from 
patent US 547678 1 



0.58 



igIoo-S= growth- 
associated protein 
GAP- 43 homoloe 



1730722 



0.58 



Tetraodon fluviatilis 
amyloid precursor 
protein mRNA, 
complete cds 



2689720 



321 993 S 



(AF037168) DnaJ homologue 
[Arabidopsis thaliana] 



HYPOTHETICAL 34.9 KD 
PROTEIN C57A10.11C IN 
CHROMOSOME I 
>gi|2058378|gnl|PID|e3 14002 
pombe] 



8e-14 



7e-14 



5e-22 



WO 01/02568 
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SEQ 
ID 



378 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



375 I U81S03 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



376 I U09781 



377 I AC002143 



U23442 



379 U 17289 



Filobasidiella 
neoformans 
translation elongation 
factor EF I -alpha 
(CnTEFl) mRNA, 

complete cds 

Candida albicans 
ATCC 18804, CBS 
562 peptide 
transporter gene, 
complete cds. 
Homo sapiens 
(subclone 4_bi0 from 
BACH 102) DNA 
sequence 



0.57 



0.57 



Tetrahymena 
thermophila RR 
internal deletion 
sequence. 
Mus musculus 

transcription factor 
AP-2 (AP-2) gene, 
alternative exon la. 
and isoform 2, partial 
cds. 



380 I X70844 



381 AJ012159 



382 I X7657 1 



na 

nuclear polyhedrosis 
virus gene for 
olvhedrin protein 



Homo sapiens 5T4 
oncofetal trophoblast 
glycoprotein gene 



H.sapiens simple 
DNA sequence region 



cl one wglaS. 



0.57 



0.57 



0.57 



0.57 



0.57 



0.57 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



3Sf 



WO 01/02568 
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ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



383 



384 



AF034434 



Vlbiiu cliuleiue 
pathogenicity island, 
putative transposase, 
aldehyde 
dehydrogenase 
(aldA), toxR- 
activated gene A 
protein (tagA), 
putative inner 
membrane protein, 
and putative zinc 
metallopro tease 
genes, complete cds; 
and... 



Mus musculus gene 
for TESP4, complete 
AB017031 cds 



385 



IS.hispidus 
mitochondrial DNA 
for SSU ribosomal 
X89788 RNA aene 



386 



Rat progesteron 
receptor gene, 5' 
L 16921 untranslated resion. 



387 



388 



AF027292 



AJ012581 



| Homo sapiens 
[interferon regulatory 

[factor 6 

ICicer arietinum 
[mRNA for 
[cytochrome P450 



389 



L15363 



Human transfer RNA 
Met (TRMEPl) 
pseudogene, complete 
kene 



390 



Helicobacter pylori 
|26695 section 3 of 
134 of the complete 
AE000525 Icenome 



391 



AF0201S9 



Amblyomma 
arnericanum 
ecdysieroid receptor 
{AamEcR) mRNA, 
3UTR. recion 1 



DESCRIPTION 



P VALUE 



0.57 



<NONE> 



<NONE> 



0.57 



<NONE> 



<NONE> 



0.57 



0.57 



<NONE> 



<NONE> 



3323116 



(AE001251)femA protein. 
putative [Treponema pallidum] 



<NONE> 



<NONE> 



<NONE> 



0.57 



0.57 



259790 



2131498 



(S48 157) DNA polymerase- 
primase ISO kda subunit 
[Drosophila melanogaster. 

Peptide, 1490 aa] 

[hypothetical protein YDR446w 
yeast CAI: 0. 1 1 [Saccharomyces 
kerevjsiae] 



0.57 



0.57 



3228680 



(AF07O935) GABA receptor 
[subunit [Musca domestical 



1938478 



(U97008) weak similarity to 
family I of G-protein coupled 
receptors [Caenorhabditis 
jeiegans] 



0.57 



2072224 



(U94875) p40 [Borna disease 
virus] 



8.9 



6.7 



5.3 



5.2 



4.0 



4.0 
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SEQ 
ID 



392 



393 



Nearest Neighbor (BlastN vs. Genbanld 



ACCESSION 



DESCRIPTION 



P VALUE 



398 



399 



[Human UbA52 gene " 
[coding for ubiquitin 
152 amino acid fusion 
X56997 [protein 



Nearest Neiehbor (BlastX vs. Non-Redundant Protei 



ACCESSION 



400 



Plasmodium 
falciparum DNA *** 
SEQUENCING IN 
PROGRESS *** 
from contig 4-81, 
AL010260 complete sequence 



394 I M99581 



395 I M38384 



396 | U32795 



Xenopus laevis 

gamma-crystallin 

(gcry3) gene. 

[complete cd s. 
IDrosophiJa 

melanogaster seven in 

absentia mRNA, 

[complete cds. 

Haemophilus 
|influenzae Rd section 
110 of 163 of the 
[complete genome 



0.57 



0.57 



Klebsiella 
pneumoniae nifX, 



397 I XI 2600 nifWaenes 



Homo sapiens mRNA 
for KIAA0626 
ABO 14526 protein, complete cds 
IRhodococcus fascians 
strain NRRL-B- 
[15096 hypothetical 
[protein gene, 
AF063587 complete cds 



0.57 



0.57 



0.57 



0.57 



0.57 



Guinea pig estrone 
LI 1 1 17 (sulfotransferase eene. 



0.57 



0.57 



ns) 



DESCRIPTION 



P VALUE 



2960113 



1 17233 



141647 



1707127 



1173433 



2909562 



482390 



4104321 



82584 



(AL022 121) hypothetical 
rotein Rv3689 



(CYPIICI4) phenobarbital- 
inducible. hepatic - rabbit P-450 
[Oryciolagus cuniculus] 
>gi|358265|prfI|1306317A 
cytochrome P450 [Oryctolagus 
cuniculus] 



IGASTRULA ZINC FINGER 
I PROTEIN XLCGF44.2 
>gi|85736|pir||S06571 finger 
protein (clone X!cGF44-2) - 
[African clawed frog (fragment) 



(U80454)T16A1.1 
[Caenorhabditis elegans] 



IRON(III)-TRANSPORT 
SYSTEM PERMEASE 
PROTEIN SFUB >gi|152861 
[(M33815) protein (sufB) 



AL021925) hypo thetical 

rotein Rv2256c 

insect-stage-specific protein - 
Trypanosoma cruzi >g\\ 162099 
(M65021) insect stage-specific 
anti«en 



(AF0345S2) vesicle associated 
protein [Rattus norvegicus] 



alpha/beta-gliadin precursor 
(clone A2 12) - wheat 



4.0 



3.9 



3.0 



3.0 



2.3 



1.4 



0.61 



0.46 



0.35 



WO 01/02568 
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I Nearest 


Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-RednnHnm Pmr^in^ 


SEQ 

ID lACCESSIO 


{ DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


401 


V00829 


mouse complete gent 
for a mouse kallikreir 
gene. Genes are mGK 
1 (complete gene) 
and mGK-2 of 
hormones, e.g., 
grow... > :: 
gb|J00390|MUSKAL 
07 Mouse pseudo- 
kali ikre in 2, exons 4 
and 5, and kallikrein 
1 gene, complete cds. 


0.57 


2500916 


NUCLEAR HORMONE 
RECEPTOR NOR-2 receptor 
[Rattus norvegicus] 
>gi|1583604|prtl|2121281A 
NOR-2 protein [Rattus 
norvegicus] 


0.20 I 


402 


X53092 


Chicken mRNA for 
beta-2 subunit of 
neuronal nicotinic 
acetylcholine receptor 


0.57 


1072256 


(U40953) similar to matrin F/G 
(SP:Q00910) containing C4- 
type zinc-fingers 
Caenorhabditis elesans) 


0.031 


403 


L07939 


Ovis ovis granulocyte 
colony stimulating 
factor 


0,57 


3874345 


Xixwte) predicted using 
Genefinder; Similarity to 
dehydrogenases; cDNA EST 
EMBL:D65800 comes from this 
gene; cDNA EST 
EMBL;D76184 comes from this 
gene; cDNA EST 
EMBL:D69322 comes from this 
gene; cDNA EST 
EMBL:C08158 comes f... 


3e-07 


404 


* 

U18061 < 


Colletotrichum 
gloeosporioides 
CAP20 (cap20) gene, 
:omplete cds. 


0.57 I 


2914695 


(AC003974) putative ubiquitin 
specific protease 


9e -08 


405 


* 

Z73955 


- japonicus mRNA 
or small GTP- 
>inding protein, 
UBI1G 


0.57 1 


1 12894 { 


1 U MUK NLLKUS IS b ACTOR; " 
ALPHA- INDUCED PROTEIN 
3 (PUTATIVE DNA BINDING 
PROTEIN A20) (ZINC 
-INGER PROTEIN A20) 
>gi|I07549|pir||A35797 
jrobable DNA-binding protein 
\20- human >gi|177S66 
M59465) A20 


7e-08 I 




4 



WO 01/02568 
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Nearest Neighbor fBlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















406 


X04335 


Petunia grp-1 gene 
for glycine-rich 
protein 


0.57 


- 

3876901 


(£7/6bU) Similarity to Human 
enoyl-CoA hydra tase 
(SW:ECHM_HUMAN); cDNA 
EST EMBL:T006l 1 comes 
from this gene; cDNA EST 
yk203dl0.3 comes from this 
gene; cDNA EST yk203dl0.5 
comes from this gene; cDNA 
EST yk457h5.3 comes from t... 


le-27 




U40718 


Rattus norvegicus S- 

adenosylmethionine 

decarboxylase 

(AMDP2) 

pseudosene 


0.56 


* 

<NONE> 


<NONE> 


<NONE> 


408 


M60318 


S.cerevisiae SSD1 
protein gene, 
complete cds. > :: 
gb|AR013983|AR0l3 
983 Sequence 8 from 
patent US 5773245 


0.56 


<NONE> 


<NONE> 


<NONE> 


409 


X60057 


Nicotiana tabacum 
blp4 mRNA for 
luminal binding 
protein (BiP) 


0.56 


<NONE> 


<NONE> 


<NONE> 




AF085930 


Homo sapiens full 
length insert cDNA 
clone YR55A09 


0.56 


<NONE> 


<NONE> 


<NONE> 


411 


AL0L0189 


Plasmodium 
falciparum DNA *** 
SEQUENCING IN 
PROGRESS *** 
from contig 3-102, 
complete sequence 


0.56 


<NONE> 


<NONE> ' 


<NONE> 


412 


X05402 


Murine G-CSF gene 
for granulocyte 
colony stimulating 
factor precursor 


0.56 


<NONE> 


<NONE> 


<NONE> 


413 


U92280 


Rattus norveeicus 
regulator of G- protein 
signalling 12 
(RGS12) mRNA, 
complete cds 


0.56 


<NONE> 


<NONE> 


<NONE> 


414 


US5660 


Human 

papillomavirus strain 
RTRX7 complete 
eenome 


0.56 


<NONE> 


<NONE> 


<NONE> 



3 S'S* 



WO 01/02568 
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II Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neiehbor (BlastX vs. Non-Redundant ProteinO 1 


SEQ 

ID [ACCESSION 


f DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


■ 

P VALUE 


1 












415 


X57626 


M. javanica 
mitochondrion 
ATPase 6, and 
putative tRNA-f-Met 
and tRNA-His genes 


0.56 


<NONE> 


<NONE> 


<NONE> 


416 


AB003363 


Sus scrofaSIOOC 
gene, complete cds 


0.56 


<NONE> 


<NONE> 


<NONE> I 


417 


L42291 


Danio rerio DANA 
element, iniron 4. 


0.56 


2650002 


(AE001062) conserved 
hypothetical protein 
[Archaeoelobus fuleidus) 


8.7 


418 




Mus musculus 
leukocystatin gene, 
complete cds 


0.56 


462493 


L-LACTA'i'L 
DEHYDROGENASE 
(IMMUNOGENIC PROTEIN 
P36) >gi|479296|pir||S33362 L- 
lactate dehydrogenase (EC 
1.1.1.27) - Mycoplasma 
hyopneumoniae 


6.7 


419 


U 17068 


Pennisetum glaucum 
Ac-like element, 
AcL2. 


0.56 


399449 


ESC ARGOT/SNAIL PROTEIN 
HOMOLOG 


6.7 


420 


Z48042 


H.sapiens mRNA 
encoding GPI- 
anchored protein 
pl37 


0.56 


141232 


HYPOTHETICAL 8.7 KD 
PROTEIN (READING FRAME 
D) >gi|76316|pir||QQSA7C 
lypothetical protein E-74 


6.7 


421 I 


AF027657 


Chonstoneura 
umiterana 
entomopoxvirus 
nucleotide 
triphosphate 
)hosphohydrolase I 
(NPHI) gene, 
complete cds 


0.56 


464999 


PUlAl'lVb 
ACETYLCHOLINE 
REGULATOR UNC-IS 
>gi|480359|pir||S36747 
acetylcholine regulator unc-18 - 
Caenorhabditis elegans 
>gi|247392|bbs| 100294 putative 
acetylcholine regulator unc-18 


- 

5.1 


422 


AB011540 i 


-lomo sapiens mRNA 
for MEGF7, partial 

cds 


0.56 1 


1718033 j 


URACLL-DNA 
GLYCOSYLASE (UDG) 
herpesvirus 2 >gi|695219 
[U20824) uracil DNA 
alvcosylase 


51 


423 


■ 

X59941 


X.maculatus NGF 
?ene For nerve growth 
factor 


0.56 


i 

J 
( 
1 
1 

U69081 : 


COMMON PLANT 
REGULATORY FACTOR 
2PRF-1 >gi|5 15621 (X58575) 
ight-inducible protein CPRF-1 
Tetroselinum crispum] 
>si|l49S301 (U46217) CPRF1 


3.S 



WO 01/02568 
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SEQ 
[D 



424 



425 



426 



428 



431 



A "> 

4jj 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



M727 1 1 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant ProteinO 



ACCESSION 



Rat transcriptional 
repressor of myelin- 
specific genes (SCIP) 
mRNA. complete cds. 



0.56 



AL023850 



U47862 



Caenorhabditis 
elegans cosmid 
Y67D11A, complete 
sequence 
[Caenorhabditis 
elegans] 



0.56 



Schistosoma mansoni 
gynecophoral canal 
protein mRNA, 
complete cds 



0.56 



427 I V00574 



Z71502 



429 I M37278 



Human germ line 
gene homologous to 
bladder carcinoma 
oncogene T24 (Gene 
code c-Ha-ras-1) with 
four exons. I 



0.56 



sene 



430 | D28878 



Z 15027 



432 | M 14362 



Z50S01 



R.norvegicus renin 
ene, exo ns 1-9. 

Therm us 

thermophilus poIA 
gene for thermostable 
DNA polymerase I, 
complete cds 



0.56 



0.56 



H.sapiens HLA class 
III DNA 



Human T-cell surface 
antigen CD2 (Til) 
mRNA. complete cds. 



Z.mays mRNA for 
chlorophyll a/b- 
sinding protein CP29 



0.56 



0.56 



0.56 



0.56 



501027 



266771 



2147138 



1518672 



1651674 



2853019 



3659692 



DESCRIPTION 



1304141 



2462979 



P VALUE 



(U01849) ORF2 (Trypanosoma 
brucei | 



LHUKlSMAlh MU1ASL 

(CM) / PREPHENATE 
DEHYDRATASE (PDT) (P- 
PROTEIN) 

>gi|281791|pir||S26053 
chorismate mutase (EC 5.4.99.5)| 
P / prephenate dehydratase (EC 
4.2.1.51) - Erwinia herbicola 
>ai|43344 



ATP synthase chain 6 - 
Platymonas subcordiformis 
mitochondrion >gi|633582 
(Z47797) ATP synthase subunit 
6 [Platymonas subcordiformis] 



109677 



(U60289) receptor protein 
tyrosine phosphatase psi [Homo 
sapiens] 



D90899) ferrich rome- i ro n 

receptor 

(AF045141) putative serine 
proteinase [Scirpophaga 
incertulas] 



2.3 



D43758) fibrinogen A-alpha- 
chain 



(Y1I915) Tenascin-X [Bos 
taurus] 



collagen alpha 1(1) chain 
mouse >gi|504S7 



2.3 



2.3 



1.7 



1.3 



1.0 



(AF06874S) sphi ngosine kinase 
Mus musculus] I 0.77 



0.76 



0.59 



0.50 



WO 01/02568 
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SEQ 
ID 



Nearest Neighbor f BlastN vs. Genbank) 



ACCESSION 



1 



DESCRIPTION 



P VALUE 



434 1 Z381I4 



435 



436 



437 



438 



440 



441 



442 



443 



444 



AF052254 



AF080649 



S.cerevisiae 
chromosome XIII 
cosmid 9745 



AJ005690 



Escherichia coli DNA 

gyrase A (gyrA) gene, 

artial cds 

legula pulligo I2S 

small subunit 

ribosomal RNA gene, 

mitochondrial gene 

for mitochondrial 

*NA, partial 

sequence 



Danio rerio mRNA 
for protein tyrosine 
kinase 



Human noggin 
(NOGGIN) gene, 
U3 1202 [complete cds. 



Ovis sp. trichohyalin 
439 X51695 |mRNA, partial 



U2S938 



Rattus norvegicus 
protein tyrosine 
phosphatase D30 
[mRNA, complete cds 



AE00117I 



Borrelia burgdorferi 
(section 57 of 70) of 
the complete genome 



Caenorhabditis 
elegans cosmid 
AF036685 C05B10 



X0U73 



Xenopus laevis 

vitellogenin gene Al 

[5* flankins resion 
- 



Mus museulus DNA 
for MS2 protein. 
D 109 11 [complete cds 



0.56 



0.56 



0.56 



0.56 



0.56 



0.56 



0.56 



0.56 



0.56 



0.56 



0.55 



Nearest Neighbor (BlastX vs. Non-Redundant Prote i ns ) 



ACCESSION 



140372 



3913223 



2623830 



3875475 



3386622 



3293547 



2315521 



1519671 



DESCRIPTION 



HVP01 Hi: 1 ' ICAL 86.0IO) 



P VALUE 



PROTEIN INGLK1-SR09 
INTERGENIC REGION 
>gi|83159|pir||S 19367 
hypothetical protein YCL039w 
yeast (Saccharomyces 
cerevisiae) 



<NONE> 



CYCLIN-DEPENDENT 
KINASE INHIBITOR I 
|p2lAVAFl [Felis cams] 



(AF030962) unknown 
[[Schistosoma mansoni] 



(Z7841 1)F02D8.3 
[Caenorhabditis elegans] 



(AC004665) unknown protein 
[Arabidopsis thaliana] 



(AF072709) putative 
pxidoreductase [Streptomyces 
[lividans] 



(AF016452) similar to the beta 
transducin family 



(U67951) contains similarity to 
ATP/GTP-bindins site motif v 
(PS:PS00017) [Caenorhabditis 
[elegans] 



(U4I558) K02B2.3gene 
product [Caenorhabditis 



1 1 IS 102 elesans] 



<NONE> 



035 



(AF038535) synaptotagmin VII 
2724126 [Homo sapiens 1 | 0.12 



0.11 



7e-06 



le-06 



Ie-10 



le-14 



4e-l6 



6e-20 



2e-31 



<NONE> 



WO 01/02568 
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SEQ 
ID 



451 



454 



455 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



445 I D30010 



446 j U51991 



447 1 Ml 8858 



448 I U952I8 



449 M 14948 



450 I AB002353 



P VALUE 



Nearest Nei ghbor (BlastX vs. Non-Redundant Prote ins) 
ACCESSION 



DESCRIPTION 



Rice mRNA EN1 17, 



rtial sequence | Q.55 
Escherichia coli 
phosphoprotein 

hosphatase I 0.55 

Mouse T cell receptor 
C-gamma-7.1 mRNA, 

3' end. I 0.55 



Homo sapiens T cell- 
death associated 
protein gene. 
complete cds 
-iuman R-ras gene," 
exon 1. 



0.55 
0.55 



Kuman mRNA for 
KIAA0355 gene, 
complete cds 



0.55 



L816S9 



452 | M6S955 



-lomo sapiens 
(subclone l_d6 from 
PI H54) DNA 

sequence I 0.55 



Human myristoylated 
alanine-rich C-kinase 
substrate (MACS) 
gene, 5' end. | 0.55 



453 I X62953 



R.norvegicus mRNA 
(pJGI16) with 
repetitive elements 



L34630 



U43521 



^ynecnocystrs sp. 
mntABC transporter 
system: periplasmic- 
binding protein 
(mntC), complete cds: 
(mntA) gene, 
complete cds; 
membrane protein 
(mntB) gene, 
complete cds 



Plasmodium berghei 
merozoite surface 
protein- 1 gene. 
complete cds 



0.55 



0.55 



0.55 



<NONE> 



<NONE> 



<NONE> 



<NONE> 
<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



3322710 



1076802 



(AE001220) V-type ATPase, 
subunit B (atpB-1) [Treponema 
pallidum] 



extensin-like protein - maize 
>gi[600118 mays] 



5.0 



5.0 



2117632 



hydrogen dehydrogenase (EC 
1.12. 1.2) -Clostridium 
acetobutylicum >gi|557064 
(U 15277) hydrogenase I 
[Clostridium acetobutylicum] 



127654 



MYOGLOBIN 



5.0 



4.9 



1%^ 
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J Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins ) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















456 


Z64937 


H. sapiens CpG DNA, 
clone I7g7, reverse 
read cn*? 1 7p7 rt 1 a 


0 55 


417298 


MFS 18 PROTEIN 
PRECURSOR 




457 


U10914 


Macaca mulatta clone 
irh83 T-cell receptor 
alpha chain mRNA, 
partial cds. 


0.55 


310406 


(L09212) tat protein [Simian 
immunodeficiency virus] virus] 


3.8 


458 


AF022838 


Homo sapiens 
mulHdrup resistance 

protein 


0.55 


1585251 


traB cene rAmvcolatoosis 
methanol ica] 


2.8 


459 


M35603 


Mouse Hox-3.1 gene 
and Hox-3.2-Hox-3. 1 
intergenic region. 


0.55 


818849 


(U25430) nucleotide 
pyrophosphatase precursor 
[Oryza sativa] 


2.0 


460 


AE001395 


Plasmodium 
falciparum * 
chromosome 2, 
section 32 of 73 of 

(hp rnmnl(*tf 

lilt VUI 1 IUlb[^> 

sequence 


0.55 


137532 


PROTEIN C2 

>gi|74386|pir||WZVZB6 59K 

HinHTTT-C nrnrein - vicrinin 

virus (strain WR) 


1.7 




rLtUUlJ7J 


Plasmodium 
falciparum 
chromosome 2, 
section 32 of 73 of 
the complete 


U.JJ 




PROTEIN C2 

>gi|74386|pir||WZVZB6 59K 
Hindlll-C protein - vaccinia 


1 7 


462 


U59736 


Human transcription 
factor (NFATc.b) 
mRNA, complete cds 


0.55 


3327144 


(ABO 14565) KIAA0665 protein 
[Homo sapiens] 


0.096 


463 


U34860 


Saccharomyces 
cerevisiae origin 
recognition complex 
large subunit (ORC1) 
gene, complete cds 


0.55 


140372 


HYPOTHtl'lCAL 86.U KD 
PROTEIN IN GLK1-SR09 
INTERGENIC REGION 
>gi|83159|pir||Sl9367 
lypothetical protein YCL039w - 
yeast (Saccharomyces 
cerevisiae) 


0.017 


464 


AF012341 


Homo sapiens 
glutaryl-CoA 
dehydrogenase 
;GCDH) gene, exons 
6,7. 8, 9, and 10 


0.55 


1166611 


(U46674) coded tor by C. 
eleganscDNA yk27d9.5; coded 
for by C. elegans cDNA 
yk27d9.3; short region of weak 
lomology to drosophilia 
suppressor of sable protein 


O.OOS 



5^6 



WO 01/02568 
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SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



465 



AF004891 



HIV- 1 isolate Q98- 



CxA from Kenya, 
envelope 

glycoprotein C2V3 
region (env) gene, 
partial cds 



0.54 



<NONE> 



<NONE> 



<NONE> 



466 



YI0159 



D.discoideum 
racGAP gene 



1 uvvru 5^'«^ 

Homo sapiens mRNA 
for B120, complete 

cds 



0.54 



<NONE> 



<NONE> 



<NONE> 



467 



AB001895 



0.54 



<NONE> 



<NONE> 



<NONE> 



468 



469 



X 12357 



Bovine gene tor 
aspartyl protease 
NM1 exons 3 and 4 > 
: lcl|X 12357 Bovine 
aspartyl protease 
NM1 gene, exons 3 
and 4. 



AE001151 



Borrelia burgdorferi 
[section 37 of 70) of 
the complete genome 



0.54 



<NONE> 



<NONE> 



0.54 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



470 



X92052 



H.sapiens mRNA for 
T cell receptor alpha 
chain 



0.54 



<NONE> 



<NONE> 



<NONE> 



471 



U00938 



Mus musculus ileal 
i pi d- binding protein 
>ene. complete cds 



0.54 



1009712 



(U27698) calreticulin 
'Arabidopsis thaliana] 



4.9 



472 



473 



X68367 



Vl.thermoformicicurn 
complete plasmid 
pFZI DNA 



0.54 



125272 



CASEIN KINAS E 11, ALPHA" 
CHAIN (CK II) 
>gi|4l9938|pir||A43297 casein 
kinase II (EC 2.7.1.-) alpha 
chain - Theileria parva 
>gi|161871 (M92084) casein 
cinase II alpha subunit 
Theileria parva] 



Z61098 



H.sapiens CpG DNA, 
clone 44c4, reverse 
read cpg44c4.rtla . 



0.54 



4191274 



(AJ131094)Xvent-lB protein 
"Xenopus laevis] 



4.7 



3.7 



474 



475 



M63962 



Human gastric H,K- 
ATPase catalytic 
subunit gene, 
complete cds. 



0.54 



3881648 



[Z70757) similar to serine 
protease inhibitor 
Caenorhabditis eleaans] 



XS6019 



■I. sapiens mRNA for 
RPL-2 protein 



0.54 



164882S 



(D87963) ETF-related factor- 1 
(ETFR-1) 



\3.7 



2.1 



WO 01/02568 
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I Nearest Neighbor (BlastN vs. Genbank) 



SEQ 
ID 



476 



477 



ACCESSION 



473 



479 



480 



481 



4S2 



483 



484 



DESCRIPTION 



.glaucescens genes 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins] 



ACCESSION 



X89010 



strU, strX, strV and 
strW for 5'- 
hydroxystreptomycin 
pruduction and 
transport 
Ipolypeptides 



Homo sapiens mRNA 
AB007836 for Hic-5. partial cds 



DESCRIPTION 



U32622 



Comamonas 
testosteroni TsaR 
(tsaR), 

toluenesulfonate 
methyl- 

monooxygenase 
oxygenase component| 
component (tsaB), 
toluenesulfonate zinc-l 
lindepedent alcohol 
[dehydrogenase. 



D61394 



Arabidopsis thaliana 
gene for beta-VPE, 
[complete cds 



D61394 



Arabidopsis thaliana 
gene for beta-VPE, 
[complete cds 



Z33072 



U45975 



Z71324 



L32090 



M.capricolum DNA 
for CONTIG MC097 



Human 

phosphatide linositol 
(4,5)bisphosphate 5- 
phosphatase homolog 
rnRNA, partial cds. 



S.cerevisiae 
chromosome XIV 
reading frame ORF 
YNL04Sw 



Listeria 

monocvtosenes sec A 
gene, complete cds. 



0.54 



0.54 



0.54 



0.53 



0.53 



0.53 



0.53 



0.53 



0.53 



3550345 



1097213 



3875351 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



21355S6 



2291129 



(AF084524) cellular repressor 
of ElA-stimulated genes CREG 
[Mus musculus] 



P VALUE 



ORF 1 [Streptomyces 
lavendulae] 



0.25 



0.15 



(296047) DY3.6 
Caenorhabditis elegans] 



<NONE> 



0.006 



<NONE> 



<NONE> 



<NONE> 



VI 130 antigen (cytosolic variant 
1) - human 



AF0164I5) No definition line 
bund [Caenorhabditis elegans 1 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



2.1 



0.70 



3* 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 



491 



492 



493 



494 



Nearest Neighbor f BlastN v S . Genbankl 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



485 D86423 



486 Y 15969 



Mus musculus mR NAl 
for HGT keratin, 
artial cds 
Mus musculus V 
kappa 21-6 gene, 
partial 



0.53 



0.52 



487 M27480 



Mus musculus (clone 
3F9) transcribed 
germline T cell 
receptor gamma chain] 
(Tcr-g) mRNA, VJ4 
C4 region. 



0.52 



488 D87004 



Human (lambda) 
DNA for 

immunogloblin light 
chain 



489 I 299704 



490 U76523 



[Human DNA 
sequence from 
cosmid E75B8 on 
chromosome 22, 
complete sequence 
[[Homo sapiens] 

Sambucus nigra lectin] 
precursor mRNA, 
[complete cds 



0.52 



0.51 



0.51 



U32795 



Haemophilus 

influenzae Rd section 
1 110 of 163 of the 
[complete genome 



[Human myoglobin 
M 14602 gene, exon 2. 



Human mRNA for 
l KIAA0238gene, 
D87075 [partial cds 



0.50 



049 



U95102 



Xenopus laevis 
mitotic 

phosphoprotein 90 
mRNA. complete cds 



0.24 



0.23 



DESCRIPTION 



1235974 



<NONE> 



3875542 



1766073 



P VALUE 



(X967 13) collagen [Globodera 
allidal I 0.41 



<NONE> 



<NONE> 



(267990) Similarity to Rat 
amiloride-sensitive sodium 
channel beta- sub unit 



4.6 



(U37272) winged helix protein 
CWH-1 [Gallus gallus] 



3.5 



<NONE> 



<NONE> 



<NONE> 



478384 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



1938429 



<NONE> 

3^ 



helicase homolog glOL protein - 
African swine fever virus 
>gi|414091 (X72951)G10L 125 

KDa pr otein j 7.0 

2) similar to 

Schizosaccharomyces pom be 4- 
nitrophenylphosphatase 
(PNPPASE) (SP:Q00472. 
NID:g5004) [Caenorhabditis 
elegans] | 2.5 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEC 
[D 


Nearest Neichbor fBlastN vs. 

} 

ACCESSION DESCRIPTION 


Genbank) 
P VALUE 


Nearest Neiph 
ACCESSION 


bor (BlastX vs. Non-Redundant P 
DESCRIPTION 


rote ins ) 
P VALUE 


495 


N.crassa 

i iiiiiucnunur mi small 
(19S) rRNA andCys 
J05254 tRNA. 


0.23 


I 192150 


(L05670) clustrin [Mus 
musculus] 


5.1 


49*6 


IGene for glutamate 
(dehydrogenase (EC 
1 1 .•+. pui. uaciena 
X16399 lorigin 


1 1 

0.23 1 790933 


(L07867) invariant surface 
glycoprotein [Trypanosoma 
brucei] 


0.030 


497 


| Treponema pallidum 

II lc#»rtirtn fH r\f (27 r\f 

n ijccuun O/ Of 0/ OI 

I AE001251 the complete genome 


0.22 


. . <NONE> 


<NONE> 


<NONE> J 


498 


1 Homo sapiens 

[amyloid lambda light 
chain variable region 

AF09fiQIO Ldma nirti'.>l /'He 
\j£.\jyiy |ITll\lN/\, paniul COS 


0.21 


m 

<NONE> 


<NONE> 


<NONE> 


499 


jD.melanogaster 
Z27247 mRNAfordefensin 


0.21 


<NONE> 


<NONE> 


<NONE> 


500 


[Candida albicans 
Y15608 IuBIj 2 ene 


0.21 


1 <NONE> 


<NONE> 


<NONE> 


501 


iriuman ueta-tuouiin 
V00598 pseudoeene. 


0.2 1 | <NONE> 


<NONE> 


<NONE> 


502 


lA.thaliana 
microsatellite 
[repeated motif 
X79426 (o a 07] 


0.21 


<NONE> 


<NONE> 


<NONE> 


503 


lA.caerulescens 
mitochondrial genes 
for cytochrome b and 

Inadh 

X75772 Idehydroaenase 5 


0.21 


139626 


PROTEIN Tl PRECURSOR 


7.S 


504 


perratia marcescens 
site specific 
AF028736 recombinase 


0.21 


< 

r 
I 
r 

3645960 C 


(AL031583) 1- 

2vidence=predicted by content; 
L-method=oenefinder*0R4 - ! - 
nethod_score=47.46; 1- 
:vidence_end; 2- 
:vidence=predicted by match; 2- 
natch_accession=SWISS- 
> ROT:P23792; 2- 
natch_description=DISCONNE 
:TED PROTEIN.; 2-matc... 


4.6 


505 


S.cerevisiae OST5 
X97545 luene 


0.21 


( 

2275631 f. 


AF014940) No definition line 
Dund [Caenorhabditis eleaans] 


2.7 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor /BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Nnn-Rprfurvhm P m trinO 


1 SEQ 
1 ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 



507 



508 



510 



511 



512 



513 



515 



506 I M24543 



M62470 



Y 13544 



509 I M24I93 



X97161 



X67649 



U63807 



Z95636 



514 I U91907 



Z35758 



516 X00334 



Human prostate 
specific antigen (PA) 
ene. complete cds. 



Mouse 

thrombospondin 
(THBSl)gene. 
complete cds. 



Homo sapiens cosmid 
CI 



Chicken MHC B 
complex protein (C12 
3) mRNA, complete 
cds. 



H.sapiens TFE3 gene, 
exon 4,5 & 6 



^.norvegicus DNA 
sequence for 
LFB1/HNF1 
promoter 



0.21 



1938527 



0.21 



548563 



0.21 



1235710 



0.21 



3600102 



0.21 



854065 



-ugu rubripes growth 
lormone (GH) gene, 
complete cds 



-I. sapiens mRNA for 
aminin alpha 5 chain 

RE 



Lirounga leomna 



major 

listocompatibility 
complex class II 
[DQA) gene, partial 
cds 



ransmissible 
astroenteritis virus 
TFI virion protein 
;enes 



Drosophila virilis 
mple DNA 
sequence (pDv-19) 



0.21 



3913114 



0.21 



3510505 



(U97012) C04E6.2 gene 
product [Caenorhabditis 
elegansj 

NA replica: 

POL YPROTEIN 2.7.7.48) - 
Erysimum latent virus 
>gi|3892232 (AF098523) 
replicase protein [Erysimum 
latent virus] 



(L40584) poly protein 
[Infectious pancreatic necrosis 
virus] 



(AF090441) extracellular reel in 
[Gallus gallus] 



(X83413) U88 [Human 
herpesvirus 6] 



TRANSCRIPTION FACTOR 
COUP 2 COUP-TFII - chicken 
>gi|392S17 (U00697) orphan 
receptor COUP-TFII [Gallus 
gallus] 



(AF0308SI) pol poly protein 
Fugu rubripes] 



0.21 



400350 



0.20 



<NONE> 



0.20 



<NONE> 



MEDIATED MRNA DECAY 
PROTEIN l)(UP- 
FRAMESHLFT SUPPRESSOR 
I) factor NAM7 - yeast 
(Saccharomyces cerevisiae) 
>gi|4023 



<NONE> 



<NONE> 



0.20 



<NONE> 



<NONE> 



2.7 



2.1 



2.0 



0.52 



0.30 



0.004 



3e-04 



le-07 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



P VALUE 



Homo sapiens biliary 
glycoprotein (BGP) 
517 1 M76741 gene, partial cds. 



518 



Mus musculus rae28 
gene, exon 1 and 
D78515 5 flankins region 



519 



520 



Drosophila 
melanogasrer RNA 
polymerase II second 
largest subunit 
upstream (DmRP 
M62975 1 140) gene, exons 1-4. 



Chicken 78-kD 
glucose- regulated 
M27260 [protein, complete cds. 



521 



522 



Rice tungro 
baci lliform virus 
Serdang strain, 
AF076470 [complete genome 



523 



Rice tungro 
baci lliform virus 
Serdane strain, 
AF076470 Icomplete aenome 
Human 

cyclooxygenase-2 
(hCox-2) gene, 



ACCESSION 



0.20 



<NONE> 



0.20 



<NONE> 



0.20 



0.20 



0.20 



<NONE> 



<NONE> 



0.20 



U04636 



524 



AE001430 



525 



AF043514 



complete cds. 
Plasmodium 



falciparum 
chromosome 2, 
section 67 of 73 of 
the complete 
sequence 



Mus musculus 
phosphomannomutase 
(Pmm2) mRNA, 
complete cds 



<NONE> 



<NONE> 



0.20 



<NONE> 



0.20 



0.20 



<NONE> 



3025006 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



HVpOTH£TiCal iS.S Kb 

PROTEIN IN MOAE-RHLE 
INTERGENIC RJEGION 
>gi|1787O09 (AE000181) orf, 
hypothetical protein 
[Escherichia coli] 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



9.S 



3W 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



U23144 



527 | UI4621 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins! 



ACCESSION 



Xenopus laevis FT2- 
F I -related nuclear 
orphan receptor 
variant (xFFlrAshort)| 
mRNA. complete cds. 



0.20 



3184402 



aracentrotus lividus 
Pax -6 (suPax-6) 
mRNA. complete cds. 



0.20 



465894 



528 I AF030511 



529 I AF07058 1 



530 I X75437 



531 I U32686 



Actinobacillus 
>Ieuropneumoniae 
MRP ATPase 
homolog (mrp) gene, 
partial cds; ApxIVA 
var3 (apxIVA) gene, 
complete cds; and 
beta-galactosidase 
(lacZ) gene, partial 
cds 

Homo sapiens clone 
24540 mRNA 

sequence 

T.maritima pgK gene 
for 3- 

phosphoglycerate 
cinase 



0.20 



1175966 



0.20 



542394 



0.20 



825648 



DESCRIPTION 



Haemophilus . 
nfluenzae Rd section 
of 163 of the 
complete genome 



0.20 



3309593 



532 228081 



S.cerevisiae 
chromosome XI 
reading frame ORF 
YKJLOS 1 w 



0.20 



2507201 



P VALUE 



(AB0I4477) period protein 
[Chymomyza co stata ] 

SIGNAL PEPTIDASE 23 KD 
SUBUNIT (SPC22/23) 
>gi|630688|pir||S 44854 
<12H4.4 protein - 
Caenorhabditis elegans 
>gi|289708 (L14331) homology 
with signal peptidase; coded for 
by C. elegans cDNAs GenBank 
M79661, M79662 and M79663; 
putative 



HYPOTHETICAL 45.3 KD 
PROTEIN IN THI5 5'REGION 
>gi|1084720|pir||S56193 
probable membrane protein 

(Saccharomyces cerevisiae) 
glyoxal oxidase (EC 1.2.3.-) 
precursor - basidiomycete 
(Phanerochaete chrysosporium) 
>si| 1050302 



IZ34531) coproporphyrinogen 
oxidase [Homo sapiens] 



(AF072S7S) ciliary outer arm 
dynein beta heavy chain 



CARBON CATABOL1TE 
DEREPRESSING PROTEIN 
KINASE >gi|1469S03 (L78I29) 
serine/threonine kinase [Candida 
[albicans] 



9.6 



7.7 



7.2 



5.8 



5.8 



5.6 



5.5 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genhank^ 1 Ne 



SEQ 

IP II ACCESSION 



533 



534 



536 



537 



538 



539 



DESCRIPTION 



Hordeum vulgare 



AF022725 



AL021726 



535 | ABO 12 106 



limit dextrinase 
(HvLD99) gene, 
complete cds 



Drosophila 
melanogaster cosmid 
171E4 



Brassicarapa mRNA 
for SRK45. complete 
cds 



Z46606 



X87193 



H.sapiens HLTF gene] 
for helicase-like 
transcription factor 



H.sapiens mRNA for 



2 ,19 gene 



L77965 



M15938 



540 I AJ003220 



541 



Clostridium 
perfringens C beta 2 
toxin gene, complete 
cds 



Chicken neural cell- 
adhesion molecule (Nj 
CAM) gene, exon 18. 



Solanum tuberosum 
mRNA for extensin- 
ke protein, partial 



Nearest Neighbor (BlastX vs. Non-Redundant Proiei 



P VALUE I ACCESSION 



r 



0.20 



0.20 



0.20 



0.20 



0.20 




0.20 



0.20 



0.20 



X98108 



A.thaliana psbPgene | 0.20 



ns) 



DESCRIPTION 



3139154 



P VALUE 



(AF064077) adrenocorticotropic 
hormone receptor [Sus scrofa] 



3885334 



(AC0O5623) putative argonaute 
protein [Arabidopsis thalianal 



4.3 



— 



2.6 



4008334 



132946 



139820 



1175950 



2133082 



2496932 



1 19227 



(Z92824)B0413.4 
[[Caeno rhabditis elegans] 

r~i — fm — wr^mm fc- * 



6US KIBUSUMAL ' PKUIEIN ' 
L30B (RP29) cytosolic - yeast 
(Saccharomyces cere vis iae) 
'>gi| 17 1 82 1 not determined) 
[Saccharomyces cerevisiae] 
>gi 1045254 cerevisiae 
>gi|l323250|gnl|PID|e243708 
(Z72933) ORFYGR148c 
[Saccharomyces cerevisiae] 



IDNA-REPAIR PROTEIN 
XRCC1 



HyPUlHLllL ' AL JJ.dKD 
PROTEIN IN SEC53-ACT1 
INTERGENIC REGION 
>gi|1084703|pir||S5621l 
probable membrane protein 
YFL044c - yeast 
(Saccharomyces cerevisiae) 
>gi|8367I l|gnl|PID|dl0O9835 
KD50617) YFL044C 



[regulato ry protein MSR1 - yeast 

Hr rtjTHtl'iOAL K£> 



PROTEIN C56G2.1 IN 

CHROMOSOME III 

>gi|7264l3 (U23I77) C56G2.1 
Igerte product [Caenorhabditis 
[elegans] 



I EPIDERMAL GROWTH 
FACTOR PRECURSOR 
precursor - mouse >gi|309210 
(J003S0) prepro-egf [Mus 
[musculus] 



1.5 



1.5 



1.5 



1.4 



1.1 



1.1 



0.49 



WO 01/02568 



PCT/US00/18374 




ACCESSION 



DESCRIPTION 



542 I AB011179 



P VALUE 



rarest Neighbor fBlastX vs. N6n-B^„nH a „. 



ACCESSION 



543 I X753I8 



544 | AB008374 



545 I U09809 



546 I AB02067 1 



Homo sapiens mRNA 
for KIAA0607 
rote in, partial cds 



H.sapiens ITIHI gene 
(exon 22) and ITIH3 
gene 



Oncorhynchus mykiss 
mRNA for alpha 3 
type I collagen, 
partial cds 



0.20 



2143753 



0.20 



629557 



-imulus polyphemus 
arginine kinase 
mRNA, complete cds 



547 I L04457 



Homo sapiens mRNA 
or KIAA0864 



protein, partial cds 
Phytophthora 



mitochondrial 
ORFI52, complete 
cds, cytochrome c 
oxidase subunit I 
(cox 1) gene, 
complete cds, 
cytochrome c oxidase 
subunit II 



0-20 



1082610 



0.20 



3882016 



0.20 



2674350 



54S | L04457 



Hiyiopmnora " 
megasperma 
mitochondrial 
ORFI52, complete 
cds, cytochrome c 
oxidase subunit I 
(cox I ) gene, 
complete cds. 
cytochrome c oxidase 
subunit II 



0.20 



746516 



0.20 



746516 



DESCRIPTION 



P VALUE 



gene v<Jh protein - rat 
>gi|205690 (M60525) nerve 
growth factor inducible protein 
[Rattus norvegicus] >gi|205701 
(M60522) nerve growth factor- 
inducible protein [Rattus 
norvegicus] >gi 1207651 
RN A- binding protein rnpD 
Arabidopsis thaliana (fragment) 
>gi|5 10240 (X61108)RNA 
binding protein [Arabidopsis 
thaliana] 



mufl protei n - human 
>gi|762953 (X860I8) mufl 
'Homo sapiens] 



AJ01265O) CP [Papaya 
ringspot virus] 



(U93l21)M-phase 
phosphoprotein-I [Homo 
sapiens] 



(U23517)Dl022.7 
Caenorhabditis elegans] 
>gi|3258651 ele°ansl 



(U23517) D1022.7 
[Caenorhabditis elegans] 
|>gi|325,S651 elegans] 



0.39 



0.38 



0.37 



0.37 



0.IS 



0.043 



o.04: 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



(Jdlor=cyclin- 



549 



550 



S82819 



551 



D31792 



U97499 



dependent kinase 5 
regulatory subunit 
p35 [mice, brain, 
129/SvJ, C57BI76, 
Genomic/mRNA, 
5528 m] 



Streptomyces gnseus 
DNA for 
serine/threonine 
protein kinases, 
complete cds 



-Jomo sapiens 
butyrophilin (BT3.2) 
gene, exons 5-10, and 
complete cds 



552 



U31463 



553 | X7840I 



Rattus norvegicus 
nonmuscle myosin 
heavy chain-A 
mRNA. complete cds 



Bacteriophage P22 
right operon, orf 48, 
replication genes 18 
and 12. nin region 
genes, ninG 
phosphatase, late 
control gene 23, orf 
60, complete cds, late 
control region, start 
of lysis gene 13 



X573I0 




X62386 



Nocardia 

lactamdurans pcbAB 
and pcbC genes for 
alpha-aminoadipyl-L- 
cysteiny[-D- valine 
synthetase and 
isopenicillin N 
synthase 



S.epidermidis genes 
epiY'. epiY, epiA, 
epiB. epiC, epiD, 
epiQ. epiP 




PUTATIVE ENDONUCLEASE 
CIF12.06C yeast 
(Schizosaccharomyces pombe) 
>gi|12179S0 (Z69944) unknown 
Schizosaccharomyces pombe] 



3874927 



(Z73424) C44B9.1 
[Caenorhabditis dedans] 



4e-09 



3e-I0 



no 



WO 01/02568 



PCT/US00/18374 




SEQ 
ID 



561 | Z49509 



ACCESSION 



DESCRIPTION 



P VALUE 



556 j X59000 



557 | M98776 



558 | AFO 11446 



Epizootic 

haemorrhagic disease 
virus gene segment 6 
for NS 1 



Human keratin 1 
ene. complete cds 
vf us musculus 
granzyme K gene, 
complete cds 



0.20 



0.20 



0.19 



559 I AF074708 



560 X13287 



562 | D89041 



vlacaca mulatto clone I 
MMUI.5 FRGl-Iike 
pseudogene, exons 7 
and 8. partial 
sequence 



0.19 



iMedicaeo sativa 
noduIin-25 gene 



0.19 



S.cerevisiae 
chromosome X 
reading frame ORF 
YJR009c 



0.19 



Prostaglandin 
F2alpha receptor, 
partial cds 



564 | AE001461 



565 1 L38559 



Streptococcus 
salivarius DNA for 
de.xtranase 
Helicobacter pylori, 
strain J99 section 22 
of 132 of the 
complete genome 
Homo sapiens 

galactocerebrosidase 
GALC) gene, exon 
17. 



0.19 



0.19 



0.19 



566 | Z8262S 



^.prowazekii 
genomic DNA 
fragment (clone 
A405F) 



0.19 



0.19 



Nearest Neighbor (BlastX vs. Nnn.P.w.,^, ^ 



ACCESSION 



3879755 



1086900 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



si 



ins) 



DESCRIPTION 



UUUnuj similar lo nucleotide" 



P VALUE 



binding protein; cUNA I' " 
EMBL:M75897 comes from this 
gene; cDNA EST 
EMBL:M89054 comes from this 
gene; cDNA EST 
EMBL.D26713 comes from this 
gene; cDNA EST 

EMBL.D26718 comes from this 
gene; cDNA... 



8e-I6 



(U41278) contains similarity to 
G beta repeats | 2e-30 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



| Nearest Neighbor (BlastN vs. Genbank) 



SEQ 
ID 



ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins; 



P VALUE I ACCESSION 



567 



568 



Tetrahymena 
thermophila 
telomerase 
component p80 
U25641 mRNA. complete cds | 0.19 



Human mRNA for 
KIAA0345 gene, 
ABQ02343 complete cds 



569 



D 10064 



Erwinia carotovora 
gene for pectate lyase 
complete cds 



570 



U3I734 



[Homo sapiens clone 
MF118 A4A10 
hypoxanthine 
phosphoribosyltransfe] 
rase (hprt) 130 kb 
deletion mutant 
mRNA, partial cds, 
contains human Alu 
element 



571 



572 



Plasmodium 
falciparum 
chromosome 2, 
section 23 of 73 of 
the complete 
AE00i386 sequence 



Homo sapiens 
hydroxymethylbilane 
synthase gene, 
M95623 [complete cds. 



573 



S67478 



574 



X99075 



575 I AF044775 



(GC*IS)=vitamin D 
binding protein/group 
specific component 
[human, peripheral 
blood leukocytes, 
Genomic, 794 nt, 
segment 4 of 9] 



H.sapiens NRGN 
gene, exon 1 



Homo sapiens 
breakpoint cluster 
region BCRderH 
sequence 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



4 t 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



576 



577 



578 



579 



580 



582 



AB002333 



U53566 



DESCRIPTION 



Human mRNA for 



KIAA0335 gene, 
complete cds 



U73664 



AF004054 



Z2708 I 




Z74145 



Macaca mulalta pit- 
1/GHF-I 

transcription factor 
mRNA. complete cds 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Protei 



ACCESSION 



ins) 



DESCRIPTION 



0.19 



<NONE> 



0.19 



1078068 



Human 
t(ll;I4)(qJ3;q32) 
breakpoint junction 
sequence 



HeterophyUaea 
pustulata rps!6 gene, 
chloroplast gene, 
partial intron 
sequence 



Caenorhabditis 
elegans cos mid 
M0 1A8. complete 
sequence 
[Caenorhabditis 
elegans] 



S.cerevisiae 
chromosome IV 



D38547 



YDL097c 
mall roum 

structured virus 

genomic RNA, 

3'terminal sequence 

containing ORF2 and 

ORF3 



0.19 



116734 



0.19 



0.19 



0.19 



0.19 



1928991 



2496247 



1174425 



971318 



P VALUE 



<NONE> 



probable membrane protein 
YLR31 1c - yeast 



COAT PROTEIN (CAP SID 
PROTEIN) virus >gi|58901 
(X62133) CyMV coat protein 
gene product 



(U92815) heat shock protein 70 
)recursor [Citrullus lanatusl 



HVPUIHLI'ICALAIP 
BINDING PROTEIN MJ0625 
>gi|2128413|pir||A6437S 
hypothetical protein MJ0625 - 
VIethanococcus jannaschii 
>gi| 159 1336 (U67510)M. 
annaschii predicted coding 
region MJ0625 



YROSINE -PRO TEIN 
KINASE SPK-1 



'Z48053) putative protein 
3ovine herpesvirus 1] 



<NONE> 



9.2 



8.8 



8.7 



8.6 



5.1 



\ 



313 



I 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BiastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



583 



584 



585 



D88000 



RNA > :: 

dbj|D88002|D88002 
Ralstonia eutropha 
DNA for 16S 
ribosomal RNA> :: 
dbj|D88OO3|D88O03 
Ralstonia eutropha 
DNA for 16S 
ribosomal RNA > :: 
dbj|D88O04|D88004 
Ralstonia eutropha 
DNA for 16S 
ribosomal RNA 



[Methanococcus 
jannaschii section 4 
Jof 150 of the 
U67462 (complete genome 
[Gall us domesticus 
microsatellite DNA 
L23906 marker. 



Helicobacter pylori, 
strain J99 section 23 
of 132 of the 
586 I AE0Q1462 complete genome 



587 



588 



M 19460 



P.putida catBC 
pperon encoding 
cis.cis-muconate 
lactonizing enzyme I 
and muconolactone 
isomerase genes, 
| complete cds. 



U22349 



Tetrahymena australis 
telomerase RNA 
gene, complete 
| segue nee 



ACCESSION 



DESCRIPTION 



P VALUE 



0.19 



3800952 



0.19 



0.19 



3183617 



1947094 



0.19 



1730177 



0.19 



3873843 



0.19 



4105782 



(AF100657) No definition line 
found [Caenorhabditis elegans] 



(AJ005586) MYB-related 
transcription factor 
[[Antirrhinum majus] 

(U93074) voltage- gated sodium 
[channel homoloa BdNal 



GLUCOSE-6-PHOSPHATE 
ISOMERASE (GPI) 
|lSOMERASE) (PHI) 
>gi|2118333|pir||I43073 glucose 
phosphate isomerase - Chinese 
hamster >gi|987046 sriseus] 



IZS773oTctJN7TESt 
yk251g7.3 comes from this 
gene; cDNA EST yk251g7.5 
comes from this gene; cDNA 
ESTEMBL:D68223 comes 
from this gene; cDNA EST 
EMBL:C 12737 comes from this 
gene; cDNA EST yk389cS.5 
comes from this gene; cDNA 
E... 



(AF049922)PGP169-I2 
[Petunia x hvbrida] 



5.1 



4.0 



3.9 



3.9 



3.2 




i 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



5891 L27745 



590 | AF049588 



Nearest Neighbor «H| a5 tX vs. Non-Red.,^^^ 




Homo sapiens voltage 
operated calcium 
channel, alpha- 1 
subunit mRNA, 
complete cds. 



591 I X06627 



Canis familiaris 
synapsin I gene, 
partial cds 



Staphylococcus 
aureus plasmid pS 194] 
sequence 



592 1 X61597 



593 | AFO 16242 



Mmusculus gene for 
kallikrein-binding 
rotein 



Dictyosrelium 
discoideum protein 
synthesis elongation 
factor 1 -alpha (tef2) 



gene, partial cds 
5enezr — 



594 I AF004447 



encephalitis virus 
strain 1327 
polyprotein gene, 
partial cds > :: 

gb|AF004460|AF004 
460 Venezuelan 
equine encephalitis 
virus strain 1385 
polyprotein gene, 
partial cds 



595 



J04S21 



596 | AF05965O 



Human elastin (ELN) 
gene, exon I, clones 
HELC-5 and HELC- 
6. 



Homo sapiens histone 
deacetylase 3 
(HDAC3) gene, 
complete cds 



0.19 



3763926 



0.19 



4104931 



(AC004450) unknown protein 
[Arabidopsis thalianal 



0.19 



137927 



0.19 



2982874 



(AE000675) cobalamin 
nthesis related Drotein CobW 



0.19 



133659 



PUTATIVE RNA-DiRECTED 
RNA POLYMERASE 



0.19 



4096173 



0.19 



1170523 



0.19 



3024881 



(U2596S) early embryoeenesis 
protein [Oryza sativa] 



INrllBIN BETA B CHAIN 
PRECURSOR inhibin precursor 

bovine >gi|563753 (U1624 1) 
betaB inhibin/activin precursor 
Bos taurus] 



PROBABLE TRANSPORT 
PROTEIN CY21CI2.11 

>gi|207SO66|gnI|PID]e31517l 
(Z952I0) betP 



3.0 



(AF042I96) auxin response 
actor 8 f Arabidopsis thalianal I 3 0 
' kb-WLLk AWkNUAbb 
PROTEIN (LATE PROTEIN 

GPI2) >gi|75856|pir||WMBP22 
gene 12 protein - phage phi-29 
>gi|2 15330 (M14782) pre-neck 
appendage protein 
[Bacteriophage phi-29] 

>gi|225367|prf]|l301270G gene 
12 [Bacteriophage phi-29] | 2.3 



1.7 



1.4 



1.3 



1.3 



0.83 



Vis 



WO 01/02568 



PCT/US00/18374 



seq[ 
-ILL 



Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

jp 1 ACCESSION 



DESCRIPTION 



P VALUE 



597 



p.meianogaster 
calcium-activated K+ 
M69053 [channel subunit 



598 



pictyostelium 
firmibasis plasmid 
Dfpl, complete 
AF076279 Iplasm id sequence 



599 



600 



D28873 



Mouse MCNP gene 
for C-type natriuretic 
peptide, complete cds 
[(exonl. exon2) 



601 



602 



Oxytricha nova 
Imacronuclear actin II 
U06071 [gene, complete cds. 



Homo sapiens CLP 
L54Q57 ImRNA. partial cds. 



iP.lividius cDNA for 
X89806 COLL2alpha sene 



603 



AE001104 



Archaeoglobus 
fulgidus section 3 of 
172 of the complete 
eenome 



604 



Rattus norvegicus 
microsatellite 
U54501 sequence D0Mco22 



605 | X74468 



606 



U20285 



607 | D49408 



Human 

papillomavirus type 
15 genomic DNA 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



Human Gpsl (GPS1) 
mRNA. complete cds 



Human gene for 
nterlcukin 3 receptor 
alpha subunit, exon 
10 



0.19 



0.19 



7 




Nearest Neighbor (BlastX vs. Non-Redundant p"r^i^7 



ACCESS f ON 



1707984 



453986 



2650444 



1584024 



3036883 



3638957 



2315192 



223951 



3695390 



DESCRIPTION 



2582659 



252236S 



^bKKfcUOXJW-UhPbNUhVT 



P VALUE 



OLUIAMAlkSVNlHASKI 
(FD-GOGAT) 

>gi|2126524|pir||S60228 
glutamate synthase (ferredoxin) 
(EC 1.4.7.1) gltB- 
Synechocystis sp. (PCC 6803) 
>gi|515938 (X80485) glutamate 
synthase 



(U00008) yejA [Escherichia 
colij 



(AE001092) acetyl-CoA 
synthetase (acs-1) 
[Archaeoglobus fulgidusl 



0.80 



0.79 



complement control protein 
[Botryllus schlosseri] 



(AL022374) putative ABC 
transporter 



(AC004877) sco-spondin -mucin 
ike; similar to P98167 uncertain 
Homo sapiens] 



(Y 1 1739) transcription factor 
Homo sapiens] 



D-MeAsp 

receptor:ISOTYPE=epsiion3 
[Musmusculus] 



(AF096371) contains similarity 
to Rattus norvegicus cyclin G- 
associated kinase (S\V:P97874) 
[Arabidopsis thaliana] 



0.63 



0.48 



0.46 



0.41 



0.35 



0.32 



(AJ002527) glucitoI-6- 
phosphate dehydrogenase 
Clostridium beijerinckii] 



(AF008596) alpha 1 > 
ucosy I transferase [He I icobacter 
pylori] . 



0.28 



0.27 



0.16 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 



611 



617 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



608 I AF04114I 



609 | L12531 



610 I AF052445 



Homo sapiens 
pituitary specific 
homeodomain protein 
(PROP I) gene, exon 
3 and complete cds 
Discopyge ommata 
Ca2-i- channel alpha 1 
subunit gene 
sequence. 



P VALUE 



Nearest Neighbor (Bias tX vs. Non-Redundant Prote ins) 
ACCESSION 



Yellow fever virus 
clone HONG9 
polyprotein gene, 

complete cds 

3,anthracis sap gene 
encoding S-layer 
Z36946 | protein 

Homo sapiens full 
jlength insert cDNA 
612 | AF087984 clone YW29A12 



Archaeoglobus 
[fulgidus section 97 of 
172 of the complete 
613 j AE001010 Igenome 



0.19 



0.19 



0.19 



0.19 



0.19 



0.19 



614 L08965 



Trichosporon 

phosphate synthetase 
large subunit (argA) 
gene, partial cds. 



615 I M91466 



IRattus norvegicus 
A2b-adenosine 
receptor mRNA, 

[complete cds. 



0.19 



0.19 



616 | X95971 



U12539 



S.lividans croEL2 
aene 



DESCRIPTION 



37403 



0.19 



Schizosaccharomyces 
pom be scd2 (scd2) 
sene, complete cds. 



0.19 



3618274 



1932822 



173241 



P VALUE 



(X03541) trk gene product (aa l- 
641) [Homo sapiens] 



XT223219) hypothetical protein 



3786014 



3135493 



1086901 



2984320 



(UI5928)KH-domain }utati ve 
RNA binding protein 

(L06487) ZIP1 protein 
[Saccharomyces cerevisiae] 



(AC005499) hypothetical 
protein [Arabidopsis thaliana] 



(AF060248) unknown 
'Arabidopsis thaliana] 



0.091 



0.069 



U41278) F33G12.3 gene 
product [Caenorhabditis 
elegans] 



3925277 



193S549 



(AE000773) acetoin utilization 
rote in FAquifex aeolicus] 



protein fAqi 

similar to 
Uncharacterized protein family 
UPF0034, Double-stranded 
RNA binding motif; cDNA EST 
yk489b3.5 comes from this 
gene; cDNA EST yk439g7.5 
comes from this gene 
(Caenorhabditis elegans] 



(U97016) similar to drosophila 
Rlcl gene product ribosomal 
protein L4 (YML4) 
(NID:e459259) 



0.001 



2e-04 



le-06 



7e-08 



2e-08 



6e-09 



7e-10 



3e-14 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlasiN vs. Genbank) 



SEQ 

ID II ACCESSION 



618 



621 



622 



623 



DESCRIPTION 



P VALUE 



U12539 



Schizosaccharomyces 
pombe scd2 (scd2) 
gene, complete cds, 



619 Z68327 



Human DNA 
sequence from 
cosmid U25DI 1, 
between markers 
DXS366and DXS87 
on chromosome X. 



620 I U66525 



Dictyostelium 
discoideum 
ORFvegll4 mRNA, 
complete cds 



U25830 



U89407 



Newcastle disease 
virus isolate Herts/33 
matrix protein 
mRNA, complete cds 



AF095598 



Mus musculus strain 
BALB/c delta- 
aminolevulinic acid 
dehydratase (Lv) 
mRNA, partial cds 



Bison bison 
athabascae 
microsatellite BBJ 2 



624 fl AF064260 



Strongylocentrotus 
purpuratus SRC8 
mRNA, complete cd: 



0.19 



0.19 



0.19 



0.19 



0.19 



0.18 



0.IS 



Nearest Neighb or (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



[(U97016) similar to drosophila 



P VALUE 



1938549 



3875774 



3540281 



2228750 



1825764 



<NONE> 



<NONE> 



yi4 



RIcl gene product ribosomal 
protein L4 (YML4) 
|(NID:g4 59259) 

hi-uu-'j j j lulu i lu l 



EMBL:D32434 comes from this 
gene; cDNA EST 

EMBL.D337 10 comes from this 
gene; cDNA EST 

EMBL:D34467 comes from this 
gene; cDNA EST 
EMBL.D35005 comes from this 
gene; cDNA EST 

EMBL:D37535 comes from this 
gene; ... 

>gi|387S710|gn!|PED|el34S373 
EST EMBL:D33710 comes 
from this gene; cDNA EST 
EMBL:D34467 comes from this 
gene; cDNA EST 

EMBL:D35005 comes from this 
gene; cDNA EST 

EMBL:D37535 comes from this 
gene; ... 



9e-15 



(AF056116) AIM related 
protein [Fugu rubripesl 



6e-15 



2e-l7 



(U93868) RNA polymerase III 
subunit [Homo sapiens] 



(U88314)C46H11.11 gene 
product [Caenorhabditis 
elegans] 



le-18 



<NONE> 



<\ONE> 



3e-25 



<NONE> 



<NONE> 



WO 01/02568 



PCT7US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbanki 



ACCESSION 



DESCRIPTION 



625 1 U69533 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundanr Protei 



ACCESSION 



626 I D89041 



627 | M2457 1 



Arabidopsis thaliana 
AtKAP alpha mRNA, 
complete cds 
Bovine DNA for 
prostaglandin 
F2alpha receptor, 
partial cds 



0.18 



628 | X59772 



629 I ALO 10209 



630 1 U67575 



631 1 U23730 



632 | X99798 



Dictyostelium 
discoideum tRNA- 
Glu-GAA gene, clone 

GluGAA7. 
D.melanogaster ovo 
gene required for 
female germ line 
development 



0.18 



0.18 



0.18 



Plasmodium 
falciparum DNA *** 
SEQUENCING IN 
PROGRESS *** 
from contig 3-104, 
complete sequence | 0.18 



633 | AF025306 



Methanococcus 
jannaschii section 1 17| 
of 150 of the 
complete genom e 
Caenorhabditis 

K10B2 



L.lactis pepFl & 
pepF2 genes 



0.18 



0.18 



0.18 



Danio rerio band 4.1- 
like protein 4 (nbl4) 
mRNA, complete cds 



0.18 



634 I AF05925 1 



635 1 222605 



636 | AB0110S6 



Mus musculus 
lipoxygenase (alox) 
mRNA. complete cds 



G.domesticus CTCF 
protein mRNA 



0.18 



0.18 



Homo sapiens mRNAj 
forKIAA05l4 

protein, complete cds 1 0.18 



ns) 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



111839 



1787604 



3406624 



465445 



1 655667 



4S1S64 



3874 1 58 



311 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



inositol 1,4,5-triphosphate 
receptor 2 - rat 



/ T J f ~ ~* » ' 1 

protein [Escherichia coli] 



(AF079110) glycosomal malate 
dehydrogenase [Trypanosoma 
brucei] 



PROBABLE NUCLEAR - 
ANTIGEN herpesvirus I (strain 
Kaplan) >gi|334072 (M3465 1) 
ORF-3 protein fPseudorabies 
virus] 



<NONE> 



<NONE> 



8.5 



8,3 



8.1 



(ZS1368) hypothetical protein 
Rv2393 



3-methyl-2-oxobutanoate 
dehydrogenase 



(Z8I464) predicted using 
Genefinder 



7.9 



6.6 



^6.6 



6.4 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



637 



Z78536 



638 



Caenorhabditis 



elegans cos mid 
C07A4, complete 
sequence 
[Caenorhabditis 
elegans) 



U67530 



Methanococcus 
jannaschii section 72 
of 150 of the 
complete genome 



639 



M63781 



640 



nfluenza 
A/Duck/England/I/62 
(H4 N6) nucleoprotein 
mRNA, complete cds 



M73781 



641 | - X67219 



642 



Oryctolagus 
cuniculus integrin 
beta-8 subunit 
mRNA, complete cds 
:: gb|I44828|I44828 
Sequence 3 from 
patent US 5635601 



D.melanogaster Rop 
gene 



AF 106941 



643 



AF052602 



Homo sapiens beta- 
arrestin 2 mRNA, 
complete cds 



Danio rerio 
huntingtin (HD) 
mRNA, complete cds 



Nearest Neighbo r (BlastX vs. Non-Redundant PmJT^T 



P VALUE |[ ACCESSION 



DESCRIPTION 



0.18 



3702121 



0.18 



3877946 



0.18 



3873663 



0.18 



GM8 



0.18 



I362I29 



3449286 



P VALUE 




(AJ01 1681) retinoblastoma- 
related protein [Chenopodium 
rubrum] 



'(ZHIUM) Weak similarity to 65 
KDA heat shock protein 
(TR:G602231);cDNA EST 
EMBL:D7 1705 comes from this 
gene; cDNA EST 
EMBL:D74382 comes from this 

gene [Caen orhabditis eleaansl 
TLWbM) cUN A Ii> 1 — 

EMBL:D71510comes from this 
gene; cDNA EST 
EMBL;C08449 comes from this 
gene; cDNA EST yk266bl2.3 
comes from this gene; cDNA 
EST yk266b!2.5 comes from 
this gene; cDNA EST 
yk461h7.3 comes from this 
gene;cDNA... 



major allergen OLE17 
common olive 



0.18 



54835; 



(AB011527)MEGF1 [Rattus 
norvegicus] 



24 1058 



IPROTEIN-PII] " 

URIDYLYLTR.ANSFERASE 
vinelandii >gi|39257 (X59610) 
uridylyl transferase 



potential IGF binding protein 
[chickens, Peptide Partial, 77 aa, 
segment 2 of 31 



6.4 



6.3 



6.2 



5.S 



4.8 



3.7 



3.6 



WO 01/02568 



PCT/USOO/18374 



SEQ 
ID 



646 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Homo sapiens mRNA 
. .forKIAA0902 
644 J AB020 709 protein, comp lete rrk 



IHI V- 1 isolate patient 
3 country USA pol 
■ Jpolyprotein (pol) 

_645 AF096383 gene, partial cds 



IPyrocoeJia miyako 
[(clone pB-PmL41) 
lluciferase mRNA, 
L 3992 3 [complete cds 



647 I MI 7082 



(Human 

carcinoembryonic 
[nonspecific 
Jcrossreacting antigen 

(CEA; NCA) gene, 
[exons 1 and 2. 



648 1 X75318 



(exon 22) and ITIH3 
gene 



649 I AF01 1908 



Mus musculus 
apoptosis associated 
tyrosine kinase 
(AATYK) raRNA, 
complete cds 



650 | U04004 



651 1 US3I55 



Simian 

mmunodeficiency 
virus SIVagmVER-2 
envelope protein 
gene, partial cds. 



Xenopus laevis 
RanGTPase 
activating protein 



0.18 



0.18 



0.18 



0.1S 



0.18 



0.18 



0.1S 



nearest Neighbor (BlastX vs. Non-Redundant Protein 



0.1S 



ACCESSION 



3875570 



2914702 



1351833 



629557 



330442 



DESCRIPTION 
U.oeji4) predicted using" 

— ~ ■■ 



(jenennaer; cUNA EST 
EMBL:M75775 comes from this 
gene; cDNA EST 

EMBL:M89255 comes from this 
(gene; cDNA EST 

EMBL:M89127 comes from this 
gene; cDNA EST 
EMBL:T00141 comes from this 
gene; cDNA EST EMBL.T... 



P VALUE I 



2.1 



3250696 (AL024436) putative protein | 1.7 



(AC003974) unknown protein 
[Arabidopsis thalianal 



0.73 



'REGULATORY PROTEIN 
ABAA 



kNA-binding protein rnpD^ 
Arabidopsis thaliana (fragment) 

binding protein [Arabidopsis 
thaliana] 



0.72 



0.41 



135102 



995714 



(K03332) nuclear antigen 2 
[ Epstein-Barr virus] 

SYNTHETASE aspartate-- 
tRNA ligase (EC 6.1.1.12) - 
Escherichia coli coli] 

>gi|I736513|gnl|PID|dl0I640I 
(D90829) Aspartate-tRNA 
ligase (EC 6.1.1.12) 
[Escherichia coli] 



(X91258)pid:eI9S503 
[Saccharomyces cerevisiael 



5e-04 



6e-ll 



2e-13 



WO 01/02568 



PCT/USOO/18374 



| Nearest Neighbor ( BlastN vs. f»»nhnnn 



SEQ 
ID 



652 



ACCESSION 



DESCRIPTION 



ZI892I 



653 I M60650 



654 1 U80912 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



B.oleracea gene for S 
receptor kinase- like 
protei n 



S.cerevisiae STA2 
zene, complete cds. 



Eucalyptus globulus 
VADP-isocitrate 
dehydrogenase 
(EglCDH) mRNA, 
complete cds 



655 



AFO 12899 



656 I AF027174 



657 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 



Arabidopsis thaliana 
cellulose synthase 

* 

catalytic subunit (Ath- 
B) mRNA. complete 
cds 



AF030231 



658 J M19183 



659 U31557 



Glycine max sucrose 
synthase (SS) mRNA,| 
complete cds 



Woodchuck hepatitis 
virus (WHV), 
complete genome, 
clone WHV 59. 



Jvme adenovirus 
IVa2 protein gene, 

DNA polymerase 
gene, terminal protein 
gene and 52.55 kDu 
protein gene, partial 
cds 



0.18 



3875535 



0.16 



<NONE> 



0.16 



0.16 



3766172 



76749 



0.16 



3044086 



0.078 



<NONE> 



0,072 



DESCRIPTION 



Iz.Ctoll) similar to nooiunaser 
cbfNAfcM tivlBL:Ub955. 
comes from this gene; cDNA 
EST EMBL:D65938 comes 
from this gene; cDNA EST 
yk280h9.3 comes from this 
gene; cDNA EST yk280h9.5 
comes from this gene; cDNA 
EST yk223dl 1.3 come... 



P VALUE 



<NONE> 



1076190 



0.072 



3511143 



le-19 



<NONE> 



(AF057298) ornithine 
decarboxylase antizyme 2 [Mus 
[musculus] 



hypothetical protein 4 - fowl 
adenovirus I 



4.2 



4.0 



(AF055904) unknown 
[[Myxococcus xanthus] 



<NONE> 
[cell wall glycoprotein, 75K, 

precursor - diatom 

(CyJindrotheca fusiformis) 

>gi|5 15363 (X80394) P75K 

gene product [Cylindrotheca 

fusiformis] 



0.60 



<NONE> 



6.3 



(AF061244) unknown 
Agrocybe aegerita] 



6.2 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Protein*) 


ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Caenorhabditis 










660 


AL02149I 


elegans cosmid 
Y44A6B, complete 
sequence 
[Caenorhabditis 
elegans] 


0.070 


<NONE> 


<NONE> 


<NONE> 


661 


M33874 


X.laevis Xotch 
protein mRNA, 
complete cds. 


0.070 


1654096 


(Y09076) RAD3 
(Schizosaccharomyces pombe] 


0.23 


662 


AB012725 


Mus musculus 
2AN75 mRNA for 
zinc finger protein, 
complete cds 


0.069 


1350800 


MITOCHONDRIAL 
RIBOSOM AL PROTEIN S5 


2.0 


663 


AL021491 


Caenorhabditis 
elegans cosmid 
Y44A6B, complete 
sequence 
[Caenorhabditis 
elegansl 


0.068 


<NONE> 


<NONE> 


<NONE> 


664 


260318 


H.sapiens CpG DNA, 
clone lei, reverse 
read cpslel.rla . 


0.068 


1280134 


(U55376)F16H11.2 gene 
product [Caenorhabditis 
eleaansl 


2.6 


665 


235973 


S.cerevisiae 
chromosome II 
reading frame ORF 
YBR104w 


0.068 


2493000 


^KObABLfc 6ULLU^ i L- 
COA:3-KETOACID- 
COEN2YME A 
TRANSFERASE PRECURSOR 
EMBL:2 14816 comes from this 

iCIlC, L'DIWA Lo i 

EMBL:2 14946 comes from this 
gene; cDNA EST 
EMBL:D69746 comes from this 
gene;cDNA EST yk219b6.3 
comes from this gene; cDNA 
ES... 


0.6S 


666 


286111 


Streptomyces tividans 
rpsP, trmD, rpIS, 
sipW, sipX, sipY, 
sip2, mutT genes and 
4 open reading 
rames 


0.068 


1235974 


(X96713) collagen [Globodera 
pallidal 


4e-04 


667 


M729S0 


Anthonomus grandis 
vitellogenin gene 
(VTG), complete cds. 


0.068 


i 

3242750 ; 


(AC005 164) match to ESTs 
AA731149 (NID:g214013S), 
AA73190S (NID:g27527l9). 
AA287837 (NID:gl933519), 
AA262811 (NID:glS9S3S2), 
and AAS25S20(NID:g2S99l32) 


le-59 



^2> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



668 



i 



P VALUE 



M34161 



669- L03811 



670 M64983 



671 AFO 14051 



672 Y07540 



Rat tachykinin (PPT) 
gene, exons 5 and 6. 



0.067 



I 



673 AJ000347 



674 I LI 9979 



Aspergillus niger zinc 

finger protein (ere A) 

£_ne, complete cds. 
^uman fibrinogen 

beta chain gene, 

complete mRNA. > 

gb|I47706|I47706 

Sequence 3 from 

patent US 5639940 



Nicotiana tabacum 
Mg chelatase subunit 
(ChJH) mRNA, 
partial cds 



H.sapiens sil gene 



*attus norvegicus 
mRNA for 3'(2'),5'- 
sisphosphate 
nucleotidase 



675 



X08050 



676 X17U5 



Yeast tRNA-Glu(3) 
gene and flanking 



r egions 



Human mRNA for 
IgM heavy chain 
complete sequence 



677 



AF032871 



Homo sapiens 
uncoupling protein 3 
(UCP3) gene, exon 1 
and partial exon 2 



0.067 



0.067 



0.067 



0.067 



0.067 



Squid sodium channel 
mRNA, complete cds.| 0,067 



0.067 



0.067 



0.067 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



92331 



129238 



glycoprotein GP330, renal - rat 
|( fragments) 

\'h &b OOKl^TE SURFACE 
ANTIGEN PRECURSOR 
(PRS25) >gi|320962|pir||A44966 
25k ookinete surface antigen 
precursor - Plasmodium 
|reichenowi reichenowi] 



2128473 



1334398 



1731331 



hypothetical protein MJ0750 
Methanococcus jannaschii 
>gi|1592304 (U67521) 
fenredoxin-type protein 



(XI508I) MURF2 protein (AA 
1-348) 



<NONE> 



7.5 



7.4 



1.5 



WVPOTriEtICALil.6Kb 
PROTEIN CY49.14C 
>gi|1370241|gnI|PID|e247089 
(Z73966) hypothetical protein 
Rv2075c [Mycobacterium 
tuberculosis] 



0.65 



112900 



ALPHA-JC-1 AbkLNkkGIC 
RECEPTOR human >gi|178194 
(J03853) kidney alpha-2- 
adrencrgic receptor [Homo 
sapiens] >gi|1628638 (U72648) 
alpha2-C4-adrenergic receptor 
[Homo sapiens] 



0.51 



0.50 



1*W 



WO 01/02568 



PCT/USOO/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor fBlasiX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












DYNAM1N 3 (DYNAMIN, 




673 


X053I9 


Mouse class II MHC 
E-beta 2 (d) gene 
exon 3 


0.067 


585074 


TESTICULAR) rat 
>gi|391 872|gnl|PID|d 1003668 
(D 14076) testicular dynamin 
[Rattus norvegicus] 


3e-04 


679 


AB006362 


Candida albicans 
CaSLNl gene, 
complete cds 


0.067 


3417296 


(AC003OO7) Unknown gene 
product (partial) [Homo sapiens] 


9e-56 


680 


AF021236 


African horse 
sickness virus capsid 
VP3 (L3) mRNA, 
complete cds 


0.066 


* # 

<NONE> 


<NONE> 


<NONE> 


681 




HpI icohncter nvlnri 

strain J99 section 68 
of 132 of the 
complete genome 


u.uoo 








682 


AF039717 


Caenorhabditis 
elegans cosmid 
R13H8 


0.066 


<NONE> 


<NONE> 


<NONE> 


683 


AF029027 


Syncerus caffer 

icnlntp Oiippn 

Elizabeth Mweya 14 
mitochondrial DNA 
control region 


0.066 


<NONE> 


<NONE> 


<NONE> 


684 


AF087967 


Homo sapiens full 
length insert cDNA 
clone YU51G05 


0.066 


2982476 


(X97203) CI protein [Beet curly 
top virusl 


9.5 


685 


J02037 


Baboon endogenous 
virus proviral long 
terminal repeat DNA. 


0.066 


972767 


(L37868) POU-domain 
transcription factor [Homo 
sapiens] 


7.3 


686 


* 

AF000141 


Lycopersicon 
esculentum class I 
knotted- like 
homeodomain protein 
(LeT6) mRNA, 
complete cds 


0.066 


3157926 


(AC002131) Strong similarity to 
extensin-like protein gb[Z34465 
from Zea mays. [Arabidopsis 
thaliana] 


5.6 


687 


AB001746 


Bensingtonia sp. 
OK255 gene for 18S 
rRNA > :: 

dbj|AB001747|ABOO 
1747 Bensingtonia 
sp. OK259 gene for 
18S rRNA 


0.066 


3859889 


(AF070064) cap 'n' collar 
isoform C [Drosophila 
melanbgaster] 


\ 

0.38 

1 — ■ v 1 



WO 01/02568 



PCT/US00/18374 



' v(| Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Helicobacter pylori, 










688 


AE001461 



strain J99 section 22 
ofl32ofthe 
complete genome 


0.065 


<NONE> 


<NONE> 


<NONE> 


689 


M 30821 


Chicken erythroid 
transport proteins cl 
and c2 


0.065 


<NONE> 


<NONE> 


<NONE> 


690 


AB009802 


Homo sapiens gene 
for osteonidogen, 
intron 3 


0.065 


<NONE> 


<NONE> 


<NONE>l 


691 


AF086062 


Homo sapiens full 
length insert cDNA 
clone YZ06B 1 1 


0.065 


<NONE> 


<NONE> 


<NONE> 


692 


AB002369 


Human mRNA for 
KIAA0371 gene, 
complete cds 


0.065 


2500884 


SIGNAL SEQUENCE 
BINDING PROTEIN binding 
protein (Synechococcus sp.] 


5.5 


693 


AF086864 


Cyclopodia sp. large 
subunit ribosomal 
RNA sene, 
mitochondrial gene 
for mitochondrial 
RNAs, partial 
sequence > 
gb|AFO86866|AF086 
866 Penicillidia sp. 
large subunit 
ribosomal RNA gene, 
mitochondrial gene 
"or mitochondrial 
RNAs, partial 
sequence 


0.065 


3721684 


- 

(AB012957) probable glycosyl 
transferase [Vibrio cholerae] 


5.5 


694 


L44593 


bacteriophage BK5-T 
ORF410, 3' end pf 
cds, 20 ORFs, 
repressor protein, and 
Cro repressor protein 
genes, complete cds, 
ORF70' gene, 5' end 
of cds. 


0.065 


1172067 


PEPTIDASE T 
(AMINOTRIPEPTIDASE) 
nfluenzae Rd] 


3.2 


695 


U80079 


Ciona intestinalis 
VI yoD- family protein 
(CiMDFa) mRNA, 
complete cds 


0.065 


4218110 


(AL035353) contains EST 
l2b:F152Sl 


2, 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Gcnbank) 



ACCESSION 



DESCRIPTION 



696 I AB020718 



,697 | AF082137 



698 | X64053 



699 | U67065 



P VALUE 



nearest Neighb or (BlastX vs. Non-Redundant PToTeTnT) 



Homo sapiens mRNA 
for KIAA0911 
rotein. complete cd s 
2ea mays copia-like 
retrotransposon Stl- 
14 leader region. 
partial sequence 



0.065 



R.norvegicus ZnBP 
gene for zinc binding 
protein 



Mus musculus 
butyrophilin (BTN) 
gene, promoter region 
and complete cds 



0.065 



0.065 



700 I M64862 



701 | K02205 



702 1 X58282 



Rat matrin F/G 
mRNA, complete cds. 



Yeast (S.cerevisiae) 
transcriptional 
activator of amino 
acid-biosynthetic 
genes (GCN4) gene, 
complete cds 



0.065 



0.065 



Maize mRNA for a 
"ligh mobility group 
protein 



0.064 



703 I AC001545 



704 I AF02346 1 



705 | U50307 



706 | U46542 



707 I X57564 



iomo sapiens 
(subclone l_f3 from 
PI H69) DNA 
sequence 



0.064 



Homo sapiens 
FRA3B region 

sequence 

Caenorhabditis 
elegans cosmid 
F43H9. 



0.064 



Streptococcus crista 
HmpA gene, partial 
cds. putative 
adhesin/ABC 

ranspon system 
protein (scbA) gene, 

omplete cds 



0.064 



0.064 



.ACCESSION 



DESCRIPTION 



p valueI 



A.rusticanu mRNA 
"or neutral peroxidase 



0.064 



0.064 



MINOR CAPS ID PROTEIN L2 
1?22734 |>gi| 1020192 type 231 



(U89278) polyhomeotic 2 
1877501 homolog fHomo sapiensl 



2132252 



3420183 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



1.9 



464963 TRYPSIN PRECURSOR 



[hypothetical protein YPL263c 

/east 

KAF041 105) organic anion 
transporter protein 3 [Rattus 
norvegicusl 



LI 



0.36 



3e-I0 



4e-I9 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



1209391 



(D83659) TPR protein pombe] 

>gi|2894282|gnl|PID|cl25U03 
(AL02I83S) pre-mrna splicing 
factor. (Schizosaccharomyces 
[pombe] 



<NONE> 



<NONE> 



<NONE> 



1492037 



(U603I5) MC094R [Molluscum 
|contagiosum virus subtype 1| 



9.2 



6.9 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



I ACCESSION 



708 



DESCRIPTION 



luman alpha- 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins^ 



ACCESSION 



DESCRIPTION 



U06986 



macroglobulin 
receptor/1 ipopro te i n 
receptor protein 
(A2MR/LRP) gene, 
exons 39-41. 



709 



Human CpG island 
(sequence, clone 
D85773 Q28B8 



710 



L06178 



Apis mellifera 
ligustica complete 
mitochondrial 
| genome 



711 



Y I 6242 



Triticum aestivum 
mRNA for beta- 
amylase 



712 



L81779 



713 



714 



Homo sapiens 
(subclone 2_a2 from 
PI H25) DNA 
sequence 



X13826 



X06487 



715 



C.reinhardtii psbl 
mRNA for OEE1 
protein of 
photosystem II 
(oxygen-evolving 
enhancer protein) 



H.sapiens mRNA for 
bc!2-Ig fusion gene 



U79638 



Vlus musculus cyclin- 
dependent kinase 
inhibitor protein 
(pI5(INK4b)) gene, 
exon 2 and partial cds 



0.064 



0.064 



0.064 



0.064 



0.064 



0.064 



0.064 



0.064 



100800 



2245382 



3695379 



1175958 



3845169 



171040 



P VALUE 



rabl5B protein - wheat 
>gi|21853 (X62476) rab protein 
[Triticum aestivum] 



(U88325) suppressor of 
cytokine signalling-1 [Mus 

musculus] 

(Ahoyoi/U) contains similarity 

to a C. elegans hypothetical 

protein F44G4.1 (GB:249910) 

and several yeast hypothetical 

proteins such as 35.1 KD 

protein in NAM8-GAR1 

intergenic region (SP:P38805) 

[Arabidopsis th aliana] 
HifPOlRbl lLAL W.:>KD 

[PROTEIN IN AGP3-DAK3 

INTERGENIC REGION 

>gi|1084712|pir||S56201 

probable membrane protein 

YFL054c - yeast 

(Saccharomyces cerevisiae) 

>gi|836701|gnl|PID|d 1009825 

KD50617) YFL054C 



5.3 



5.3 



(AE001391) phosphatase (acid 
phosphatase family) 



(M94535) ATPase 
[Saccharomyces cerevisiae] 
cerevisiae, Peptide, 377 aa] 
[[Saccharomyces cerevisiae] 



2429362 



(AF02026I) proline rich protein 
[[Santalum album] 



3929221 



(AF032557) TRFI-interacting 
ankyrin-related ADP-ribose 
Ipolymerase [Homo sapiens] 



3.2 



3.1 



0.81 



0.054 



0.016 



le-10 



WO 01/02568 
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SEQ 
ID 



717 



Nearest Neighbor (BlastN vs. Genbank) 




ACCESSION 



DESCRIPTION 



716 I U39099 



P VALUE 



Human T cell 
receptor alpha chain 
mRNA. partial cds 



Clostridium 
acetobutylicum KdpC 
(kdpC) gene, partial 
cds. sensor histidine 
kinase homolog 
(kdpD) and response 
regulator homolog 
(kdpE) genes. 



0.063 



U39673 complete cds 
Human DNA 



|sequence from clone 
140L1 on 

(chromosome 22q 13. 1- 
1 13.3 1, complete 
[sequence [Homo 
AL0223I7 (sapiens] 

Spiroplasma citri orfa 
and orff genes, partia 
cds, orfb, orfc. and 
orfe genes and 
Spiroplasma virus 
SpVl -derived ORF1 
and ORF3 genes, 
complete cds, and 
SpVl-derived ORF14 
gene, partial cds 



0.063 



0.063 



719 U28972 



720 U15159 



721 I AF0584I6 



Mus musculus limk 
kinase (limk) mRNA, 
complete cds 



722 | AE001430 



Homo sapiens 
ipoprotein receptor- 
related protein 
(LRPl),exons 39,40, 

and 41 

Plasmodium 

falciparum 
chromosome 2, 
section 67 of 73 of 
the complete 
sequence 



0.063 



0.063 



0.063 



0.063 



ACCESSION 



DESCRIPTION 



<NONE> 



<NONE> 



1931640 



4091939 



3638957 



1788123 



2244S49 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



(U95973) Serine 
carboxypeptidase isolog 
[[Arabidopsis thalianal 



<NONE> 



5.2 



envelope 
[glycoprotein [Human 
immunodeficiency virus type 1] 
(AC004S77) sco-spondin -mucin 
like; similar to P98167 uncertain 
[Homo sapiens] ' 



5.2 



5.1 



(AE000276) orf, hypothetical 
[protein [Escherichia coli] 



4.0 



1(297337) hypothetical protein 



\ 



4.0 



WO 01/02568 



PCT/USOO/18374 



SEC 
ID 


Nearest 
ACCESSIOr 


Neighbor (BlastN vs. Genbank) 

^ DESCRIPTION p VALUE 
Streptococcus I 


I Nearest Neieh 
ACCESSION 


bor (BlasiX vs. Non-Redundant P 
DESCRIPTION 


roteins) 
P VALUE 


723 


L29323 


pneumoniae methyl 
transferase gene 
cluster, complete I 
sequence 0.063 


3874077 


(Z70203) cDNA EST 
EMBL:D723i9 comes from thii 
gene; cDNA EST 
EMBL.D75197 comes from this 
gene n-aenornaoditis eleaansl 




724 


X72631 


H.sapiens mRNA 
encoding Rev- 1 
ErbAalpha > :: 
emb|X72632|HSREV 
ERB2 H.sapiens 1 
mRNA encoding Rev-I 
ErbAalpha (internal I 
fragment) | 0.063 


1 3979878 


U/MUb) predicted using 
Genefinder; cDNA EST 
EMBL:T01277 comes from this 
gene; cDNA EST 
EMBL:T01796 comes from this 
gene; cDNA EST 
EMBL:D32545 comes from this 

EMBL:D33060 comes from this 
gene; cDNA EST EMBL;D... 


2.3 j 
1.7 


725 


U 17969 


Human initiation 
factor eIF-5A gene, 1 
complete cds. 0.063 




(AF025467) contains similarity 
to drosophila DNA-binding 
protein K10 (NID:g8148) 
vaenornaoaitis elegans) 


• * 

1.4 


726 


AEOOI000 


Archaeoglobus 
fulgidus section 107 
of 172 of the 

complete genome 0.063 


3462802 


(AF082486) nef protein [Human 
immunodeficiency virus type 1] 


0.35 


727 


S80986 


svp[40]=svp-relaied 
nuclear 

receptor/retinoid 

signaling modulator 1 

(zebrafishes, mRNA, 

3876 ntj 0.063 


1326288 


(U58734) weak similarity to 
anKynn vj n^aenornaoaitis 
elegans] 


0.093 


728 


AF109134 i 


Homo sapiens 7-60 1 
mRNA, complete cds 1 0.063 


• 1083764 


proline- rich proteoglycan 2 
precursor, parotid - rat 
^gipiuzuu ^li / j ioj prohne- 
rich proteoglycan [Rattus 
lorvegicus) 


0.001 


729 I 


] 
1 

D87466 f 


-luman mRNA for I J 
<IAA0276 gene, j 
martial cds 0.063 


2879865 f 


AL021816) SPBC24E9.03c, 
inknown, len:25 laa 
Schizosaccharomyces pombe] 


6e-05 


730 


I 
f 

AB01S269 p 


iomo sapiens mRNA 
or KIAA0726 

>rotein, complete cds 0.063 


( 

2995865 f 


AF053455) tetraspan TM4SF 
Homo sapiens] 


2e-16 


■ 

731 


C 
n 
C 

DS6954 12 


>icetulus griseus 1 
iRNA for 

Cytochrome P-450 1 

A 1 4, complete cds | 0.063 


F 
C 
> 

2496S96 f< 


IVIpOTMetiCal 4V.6kb 

'roteinc16c10.5 in 
:hromosome hi 

gi|3S743S3|gnljPID|e 1 344077 
^pe (RING finger) 
^aenorhabditis elegans 1 


le-22 



i 



WO 01/02568 



PCTAJS00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



fHasmodium 



P VALUE 



[falciparum DNA *** 

Sequencing in 

PROGRESS *** 
from con tig 4-5 8, 
732 1 AL010232 [complete sequence 



733 



Mycoplasma 
gallisepticum 
haemagglutinin 
precursor genes, 
U907I4 [complete cds 

|Homo sapiens clone 
pCL4 DNA-binding 
protein SOX2 1 
l(SOX21)gene, 
734 I AF 107044 complete cds 



0.062 



Laenorhabditis 
[elegans Ro 

ribonucleoprotein 
[autoantigen mRNA, 
735 J L4I729 [complete cds 



0.062 



0.062 



736 I 299287 



|Caenorhabditis 
|elegans cosmid 
Y7A9D, complete 
[sequence 

JlL-aenorhabditis 
[elegans I 



737 



738 



[Homo sapiens mRNA 
for KIAA0614 
AB014514 protein, partial cds 
(Human germline 
immunoglobulin light 
chain variable region 
(lambda-IIIb 
subgroup) from IgM 
L29165 Irheumatoid factor. 



0.062 



0.062 



0.062 



739 I UQ9364 



(Schistosoma 
japonicum Chinese 
I clone pY6 
[paramyosin mRNA, 
[partial cds. 



0.062 



740 I Yl 6242 



Triticum aestivum 
mRNA for beta- 
amyiase 



0.062 



0.062 



■ Nearest Ne.phbpr (BlastX vs. Non-Redundant pTr^wT 



ACCESSION 



DESCRIPTION 



p value! 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NQNE> 



<NONE> 



<NONE> 



2983060 



(AE000687) putative protein 



1176542 



|[Aquifex ae olicusl 
PlTlAUVfc 

[SERINE/THREONINE- 
PROTEIN KINASE D 1044.3 
IN CHROMOSOME III 
>gi|495684 (U00065) contains 
EGF-like repeats; highly similar 

[to ZC34.1 ; 3' exons similar to 

[protein kinase le.aenorhabditis~ 

jelegans] 



8.6 



5.8 



4033395 



DNA GYRASE SUB UNIT B 
subunit [Myxococcus xanthus] 



3.9 



1914685 



(Y12014)RAD23 protein, 
[isoform II 



1350800 



79834 



MITOCHONDRIAL 
RIBOSOMAL PROTEIN S5 



hypothetical protein 1246 (uvrA 
region) - Micrococcus luteus 
(fragment) 



1.3 



1.3 



0.59 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSIOr> 


} DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Leishmania pifanoi 






TROPOMYOSIN I (TMI) 




741 


M97695 


cysteine proteinase 

( /"* i / C \ nana f+r\mr% 1 &t t. 

^t-ys*.^ gene, compicit 
cds. 


0.062 


1174754 


(POLYPEPTIDE 49) 
>gi|320989|pir||A60607 
tropomyosin - fluke 


0.018 


742 


U67526 


Methanococcus 
jdnnabcnu section Do 

of 150 of the 
complete genome 


0.062 


1330345 


lUi^S) coded lor by L". 
elegans cDNA yk34bl.5; coded 
for by C. elegans cDNA 
yk!3hl0.5; coded for by C. 
elegans cDNA yk46e8.5; coded 
for by C elegans cDNA 
yk46d5.5; coded for by C. 
elegans cDNA yk43c2.5; coded 
for by C. elegans cDNA 
yk46e8.... 


le-40 


743 


Z78414 


Caenorhafiditis 
elegans cosmid 
W09D 12, complete 
sequence 
[Caenorhabditis 
elegans 1 


0.061 


<NONE> 


<NONE> 


<NONE> 


744 


Y13606 


Mus musculus gene 
encoding filensin, 
exons 6, 7 


0.061 


2314715 


(AE000651) H. pylori predicted 
coding region HP 1527 


4.9 


745 


J04374 


Eggplant mosaic 
virus aenome. 


0.061 J 


141449 


H irU I H_h TIC AL 35.0 KD 
PROTEIN IN TRANSPOSON 
TN4556 >gi|80759|pir||JQ043 1 
hypothetical ^5.5K protein - 
Streptomyces fradiae transposon 
Tn4556 


3.8 


746 


AB022200 


lvicii lilt uiHigulCiy 

oligotrophic 
bacterium POO- 10 
uina ror I Oo 
ribosomal RNA, 

ijirti.il ^pntipnpp 


0.061 j 


■ 

3983593 


(AB000307) transcarboxylase- 
?eta 


2.2 


747 


, 

] 

X54250 ] 


Rat mRNA for zinc 
r mger protein AT- 
BP2, partial cds 


0,061 


1377886 


[L46S15) DNA binding protein 
R.c [Mus musculus] 


0.9S 


1 3 

D 

74S 1 X69942 


VI. musculus mRNA 
:>f enhancer-trap- 
ocus I 


0.061 1 


2983969 


AE00074S) putative protein 
"Aquifex aeolicus] 


0.57 


749 


AJ223206 


vlus musculus mRNA 
or scrapie responsive 
)rotein i 


0.061 J 4204265 


AC005223) 45643 
Arabidopsis thaliana] 


5e-31 


750 


Y 10205 


-I.sapiens mRNA for 
ID88 protein 


0.060 


<NONE> 


<NONE> 


<NONE> 



WO 01/02568 PCTAJS00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



751 | U79260 



752 I X07453 



Human clone 23745 
mRNA. complete cds 



Plasmodium 
falciparum 11-1 gene 

pan 1 

Rattus norvegicus 
protein tyrosine 
phosphatase delta 
gene, catalytic 



_Nearest Nei ghbor (BlastX vs. Non-Redundant Prote ins! 
ACCESSION 



753 I U57502 domain, panial cds 



754 



M:fascicularis gene 
Ifor apolipoprotein C- 
X68359 III 



Fseudomonas braB 

gene for branched 

chain amino acid 

transport carrier (LIV 
755 1 X51634 II) 



0.060 



0.060 



0.060 



0.06Q 



756 | AF0724QS 



Gossypium hirsutum 
cotton fiber expressed 
protein 2 (CFE2) 
mRNA, complete cds 



757 



[Sambucus nigra 
ribosome inact ivating 
protein precursor 



AF012899 JmRNA. complete cds Ofttfi 



758 



Rattus norvegicus 
homer- lc rriRNA, 
AF093268 complete cds 



0.059 



759 | X61046 



760 | AJ005813 



Hydra N-COL 2 
mRNA for mini- 
collagen, partial cds 
Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzyme 



761 | S79S43 



(random amplified 
Ihybridization . 
microsatellite 
RAHMJ (Beta 
|vulgaris=sugar beets. 
IGenomic, 537 ntl 



0.054 



0.053 



0.052 



DESCRIPTION 



0.025 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



730843 



1835622 



2662481 



547847 



<NONE> 



<NONE> 



(AF044915) polar tube protein 
3452285 hTP55 precursor 



1730145 



0.28 



SHUTTLE CRAFT PROTEIN 
l>gi|487400 , 2e . 04 



(U85718) CCML (Pseudomonasl 
putida GB-1] 



8.1 



alkaline phosphatase, 145K 
Synechococcus sp. 



4.7 



(/win qov) juvenile hormone 
resistance protein 



3.3 



[LECTIN PRECURSOR 



7.0 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



GAMETOGENESIS 
[EXPRESSED PROTEIN GEG- 
154 >gi|2137331|pir)|I4S361 
jgene GEG-I54 protein - mouse 
>gi|550l23 (X71642) 
|pid;g550123 [Mus musculusi 



2e-16 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

J D I ACCESSION 



DESCRIPTION 



762 



763 



764 



AB000096 



Mouse mRNA for 




Z62366 



LU670 



GATA-2 protein, 
complete cds 



H.sapiens CpG DNA, 
clone 67h7, forward 
read cpg67h7.ftla - 



P VALUE I ACCESSION 



I 



0.023 



Human 
transmembrane 
glycoprotein (CD53) 
gene, exons 2 through 
8. 



765 



766 



767 



D83984 



Sulculus diversicolor 
DNA for IDO-Iikc 
myoglobin, complete 
cds 



X98890 



U58835 



S. tuberosum mRNA 
for inorganic 
phosphate 
transporter, StPTl 



0.023 



Dissostichus mawsoni 
preprotrypsin gene, 
complete cds 



Glomus versiforme 
Ichitin synthase gene 
768 1 AJ009630 [(clone Gvchs3) 



769 



Human glucagon 
J04040 jmRNA, complete cds 



770 



X74908 



L.esculentum Asr3 
kene 



771 | L07293 



Shigella dysenteriae 
O-antigen 
polysaccharide 
biosynthesis rfbX. O- 
antigen polymerase 
(rfc), rhamnosyl 
tranferase I and II 
(rfbR and rfbQ) and 
rft>D genes, complete 
cds. 



0.023 



0.023 



0.023 



0.022 



0.022 



0.022 



0.022 



0.022 



DESCRIPTION 



3123312 



80636 



3114665 



683532 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 

ZINC FINGER PROTEIN 142 
(KIAA0236) to Human zinc 
finger protein(ZNF142) [Homo 
sapiens] 



5.9 



hypothetical 67K protein - 
Mycobacterium fortuitum 
plasmid pAL5000 >gi| 149986 
(M60875) ORF2 



(AF06I267) inner membrane 
component HtxE [Pseudomonas 
stutzeri] 



(X02155) thyroglobulin [Bos 
taurus] 



3.4 



3.4 



1.1 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbanlcl 



ACCESSION 



DESCRIPTION I p VALUE 



IJwus musculus 



[inositol 

polyphosphate 5- 
phosphatase II 
(INPP5P) mRNA. 
772 | AF040Q94 complete cds 



H.sapiens HLA-DMB 
773 I X76776 pene 



0.022 



0.022 



Helicobacter pylori, 
strain J99 section 82 
of 132 of the 

774 | A E0O1521 Icom plete genome | 0.022 

A.longa rbcL, rp!5, 
rps3, rpl36, rpsl4, 
rps2, trnUmF, trnC 
and rpoB (partial) 
genes > :: 

emb|X75651|ALRIBp| 
A.longa plastid genes 
for ri bosom a I 
proteins, tRNAs, 
RNA polymerase 
sub unit beta and 
rubisco large subunit | 0.022 



775 I X16004 



776 1 Y 12707 



ILactococcus lactis 
cremoris plasmid 
pHW393DNA, 
Irtladii, mlladii genes 



0.022 



777 I U27118 




Arabidopsis thai i ana 
glutamyl-tRNA 
reductase 



0.022 




H.sapiens telomenc 
DNA sequence, clone] 
5PTEL002, read 



5PTELOO002seq | Q.Q22 



Sulculus diversicolor 
DNA for IDO-like 
myoglobin, complete 
cds 



0.022 



H.sapiens flow-sorted 
chromosome 6 
Hindll 1 fragment, 
SC6pA4A3 



0.022 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins" 



ACCESSION 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



1078509 



4204206 



DESCRIPTION 



<NONE> 
<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



(J05503) carbamoyl-phosphate 
19133 3 Isynthetase (E.C.6.3.5.5) 



probable membrane protein 
YDRQ1 8c - yeast 



(AB022786) N-acetyl-beta-D- 
glucosaminidase [Enterobacter 
bp- 1 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NO! 



<NONE> 



9.8 



9.7 



\ 



7.5 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Xenopus laevis 



7811 M 10217 



mitochondrial DNA, 
complete genome. 



782 



783 



M55147 



X58839 



784 I M26185 



785 



786 



Pea chloroplast 
glyceraldehyde-3- 
phosphate 
dehydrogenase 
(Gpbl) gene, 
complete cds. 



Acholeplasma virus 
MV-LI DNA for 
complete circular 
genome 



Mouse c-myb 
oncogene, exon 1 and 
exon 2 (partial). 



AF061I95 



Streptomyces albus 
valine dehydrogenase 
(Vdh) gene, complete 
cds 



|Homo sapiens alpha 
1,2-mannosidase IB 
AF053622 [gene, exon 9 



787 



Scerevisiae 
chromosome XIV 
reading frame ORF 
Z71500 YNL224c 



788 



D10471 



Herpes simplex virus 
type 2 genomic DNA 
for 0.74-0.84 region, 
[complete cds 



0.022 



789 J U43082 



[Zea mays T 
cytoplasm male 
sterility restorer 
factor 2 (rf2) mRNA, 
[complete cds 



0.022 



0.022 



0,022 



0.022 



0.022 



0.022 



0.022 



Nearest Neighbor rBlnstX vs. Non-Redundant Prote 



ACCESSION 



ins) 



DESCRIPTION 



0.022 



2145763 



417308 



3273189 



138592 



2088768 



1352361 



1708875 



3132276 



3319720 



] 



B2168_C2__205 protein 
Mycobacterium leprae 



PROBABLE HELICASE 
MOTI Motlp is a probable 
helicase essential for vegetative 
growth on rich glucose medium 
at 30 degree C: Swiss- Prot 
Accession number P32333; 
similar to S. cerevisiae RAD26 
gene product: Swiss-Prot 
Accession number P40352 



(AB008757) subunit II of 
c(o/b)3-type cytochrome c 
oxidase [Bacillus 
stearothermoph i I us ] 

VllbLLUUtNlNI 

PRECURSOR (YOLK 
PROTEIN I) 
>gi|72270|pir||VJFFI 
vitellogenin I precursor 
unnamed protein product 
[Drosophila melanogaster] 



(AF003145) B0414.8 gene 
product [Caenorhabditis 
elesans] 



EARLY GROWTH 
RESPONSE PROTEIN 1 fish 
>gi|53I456 (UI2895)egrI 
[Danio rerio] rerio] 



PUTATIVE TUMOR 
SUPPRESSOR LUCA15 
sapiens] 



(AB0I14S6) short ORF [TT 
virus] 



(AL03 1035) putative aldehyde 
dehydrogenase [Streptomyces 
coelicolor] 



P VALUE 



7.3 



4.2 



4.1 



2.5 



0.86 



0.36 



0.16 



0.1 



0.011 



WO 01/02568 



PCTAUS00/18374 



794 



795 



ACCESSION 



DESCRIPTION 



790 I X86913 



791 I AF I QQ694 



799 



800 



P VALUE 



H.sapiens simple 



792 U340I6 



793 | X00845 



tandem repeat DNA 
(clone w°3a6) | 0.02 1 

Mus musculus 
Pontin52 rnRNA, 
complete cds | 0.02 1 

Nannostomus sp. 
arge subunit rRNA 
gene, mitochondrial 
gene encoding 
mitochondrial rRNA, 
>artial sequence. | 0.021 
Yeast mitochondrial 
genes for 15S rRNA 
and tRNA-Trp 



[Homo sapiens gene 
Ifor CCchemokine 
IPARC precursor, 
ABO 12113 complete cds 



Daucus carota 
globulin-like protein 
(Gea8) gene, 
U62395 complete cds 



P.falciparum actin II 
796 j M22718 gene, complete cds. 



Arabidopsis th ai iana 
Jglutamyl-tRNA 

797 I U27118 [reductase 

H.sapiens CLN3 

798 J X99832 gene, complete CDS 



Homo sapiens TRAIL 
receptor 2 mRNA, 
AFQ16266 complete cds 



292541 



Human DNA 
sequence from PAC 
179115, BRCA2 gene 
region chromosome 
1 3q 12-13 contains 
lactase-phlorizin 
hydrolase (LCT) 



0.021 



0.021 



ACCESSION 



0.021 



0.021 



0.021 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



2623773 



3549885 



262249 



729048 



585820 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(AF004835) tyrocidine 
synthetase 3 [Brevibacillus 
brevis] 



(AJ006631) cysteine-rich 
secretory protem- 1 [tquus 
caballus) 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



8.8 



(S52010) orfl 5' of EpoR [mice, 
Peptide. 85 aa] [Mus sp.l 



SUCCINYL- 
COA:COENZYME A 
TRANSFERASE transferase 
[Clostridiurn kluyyeri] 



8.7 



H7 
N- 

ACETYLGLUCOSAMINETR 
ANSFERASE>gi|466761 
(U00039) rfaK [Escherichia 
coli] >gi| 1790053 (AE000440) 
probably hexose transferase: 
ipopolysaccharide core 
iosynthesis 



5.: 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



801 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



|dopamine"D2" 



P VALUE 



S58588 




M60522 



803 



804 



AF045654 



receptor [human, 
brain, Genomic, 3794 
nt, segme nt 4 of 5] 



Rat nerve growth 
factor-inducible 
protein (VGF) gene. 
complete cds. 



Callus gallus 
neuregulin beta-la 
mRNA, complete cds 



M6902 3 Human ff Iob in sene. 

c o 



805 



806 



Z65960 



IRsapiens CpG DNA, 
(clone 69d2 t reverse 
read cpg69d2.rtlb . 



[A.oligospora gene 
X97073 encodins lectin 



307 I X56491 



D. melanogaster 
I mRNA for gene 
(containing opa 
[repetitive element 



808 



L78760 



Homo sapiens 
((subclone l_f6 from 
PI H31) DNA 
sequence 



809 



810 



Homo sapiens 
KIAA0404 mRNA, 
AB007864 partial cds 



0.021 



0.021 



0.021 



0.021 



0.021 



0.021 



0.021 



0.021 



0.021 



[Mycobacterium 
tuberculosis H37Rv 
complete genome; 
AL021932 segment 22/162 



0.021 



Nearest Neighbor (BlastX vs. Non-Redundai 



ACCESSION 



DESCRIPTION 



2677620 



4103934 



(Y08029) NAD(P)(+).-arginine 
ADP- ri bosy 1 trans ferase 
[Oryctolagus cuniculus] 



5.1 



(AF03OO5O) replication factor C 
[Rattus norvegicus] | 3.j 



2746829 



3880259 



1707245 



116949 



(AF040647) No definition line 
found [Caenorhabditis elegansl 



(282056) T26H5.8 

[Caenorhabditis elegans] 

>gi|3880787|gnl|PID|el 350288 
(AL032620) T26H5.8 



(U80845) similar to family 1 of 
G-protein coupled receptors 
Caenorhabditis elegans] 



CORE ANTIGEN 
>gi|73601|pir||NKVLC2 core 
antigen - wooden uck hepatitis 
virus 2 > gi|336135 



2842750 



HOMEOBOX PROTEIN DLX- 



7 >gi|1620520 



113671 



11S144 



2909514 



' ' ! ! ALU CLASS F WARNING 
ENTRY !!!! 



L V i> IhllVlh S k W IHA^h A (U- 
ACETYLSERINE 
SULFH YDRYLASE A) (O- 
ACETYLSERINE (THIOL)- 
LYASE A) (CSASE A) 
>gi|68323|pir||S YEBAC cysteine 
synthase (EC 4.2.99.8) A - 
Salmonella typhimurium 
>gi| 153935 (M21450) cysK 
protein [Salmonella 
:yphimurium] 



(AL02 1932) hypothetical 
protein Rv0439c 



3.0 



2.4 



0.79 



0.47 



0.16 



0.15 



0.12 



7e-I0 



WO 01/02568 PCT7US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















811 


U8999 1 


nypuLicti jcLuiiiiii 

mannose- 1 -phosphate 
guanylyltransferase 
(MPG1) mRNA, 
complete cds 


0.021 


3581924 


(AL031538) mannose-i- 
phosphate guanyltransferase 
[Schizosaccharomyces pom be J 


6e-20 


812 


X00641 


Sugar beet 
mitochondrial 
minicircle pO 
sequence 


0.020 


<NONE> 


<NONE> 


<NONE> 


813 


Z50097 


D.melanogaster 
mRNA for hdc 
protein. 


0.020 


* • 

<NONE> 


<NONE> 


<NONE> 


814 


AF044866 


Phoebis sennae large 
subunit ribosomal 
RNA gene, partial 
sequence; tRNA-Val 

npni 1 rnmn |pf#» 

sequence; and small 
subunit ribosomal 
RNA gene, partial 
sequence, 

mitochondrial genes 
tor mitochondrial 
RNAs 


0020 


<NONE> 


<NONE> 


<NONE> 


815 


AF074386 


Sambucus nigra 
mRNA, complete cds 


0.020 


<NONE> 


<NONE> 




"<NUNh>" 


816 


AF027174 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 

cds 


0020 


<NONE> 


<NONE> 


<NONE> 


817 


AE001405 


Plasmodium 
falciparum 
chromosome 2, 
section 42 of 73 of 
the complete 
sequence 


0.020 


2196776 


(AF003342) bunched gene 
product [Drosophila 
melanogaster] 


8.4 


SIS 


AF074387 


Sambucus nigra 
hevein-litce protein 
mRNA, complete cds 


0.020 


627071 


histidine-rich protein - 
Plasmodium lophurae 


2.S 



4 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















819 


Y13304 


Hylobates hoolock 
mitochondrial DNA 
for cytb gene, Horace 


0.020 


285580 


(D 10043) ORF (Acetobacter 
pasteurianus] 


2.1 


820 


Z66539 


H.sapiens creatine 
transporter gene 


0.020 


1703594 


fU8U4j9) coded lor by L. 
elegans cDNA yk7c3.5; coded 
for by C. elegans cDNA 
ykl33b3.5; coded for by C. 
elegans cDNA yk65a4.5; coded 
for by C. elegans cDNA 
yk7c8.3; coded for by C. 
elegans cDNA CEESQ66F; 
coded for by C. elegans cDNA 
yk65a4.3;... 


0.98 


821 


AF053622 


Homo sapiens alpha 
1,2-mannosidase 18 
gene, exon 9 


0.020 


1352361 


EARLY GROWTH 
RESPONSE PROTEIN 1 fish 
>gi|531456 (U12895) egr] 
[Danio rerio] rerioj 


0.72 


822 1 


M20555 


Human MHC class II 
HLA-DRw53-beta 
(DR4,w4) gene, 
exons 2,3,4,5,6. 


0.020 


465569 


H I Y\J i tih 1 ILAL JS. 1 KJU 
PROTEIN IN SBCB-HISL 
JNTERGENIC REGION 
>gi |405956 (U00009) 
ORF_ID:o349#4; similar to 
[SwissProt Accession Number 
P330I5] [Escherichia con] 
>gi|1736693|gnl|PID|dl016570 
Number P33015] [Escherichia 
co!i] >gij 1788323 (AE000292) 
putative transport system 
permease protein [Escherichia 
coli] 


0.43 


■ 

823 1 M20555 


Human MHC class II 
HLA-DRw53-beta 
(DR4,w4) gene, 
exons 2,3,4,5,6. 


0.020 


* 

1709751 


COENZYME PQQ 
SYNTHESIS PROTEIN F 
synthesis F - Pseudomonas 
fluorescens >2i|929802 


0.42 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



824 1 AJ005015 



Homo sapiens mRNA 
[for putative SMC-Iike 
»rotein t partial 



Nearest Neighbor (BlastX vs. Non-Redundant Prote 



ACCESSION 



DESCRIPTION 



825 



I Lace aria bicolor 
glyoxal malate 
[synthase protein 
AF034099 I mRNA. complete cds 



826 1 AF1 00694 



827 I AF093268 



828 I AF100694 



Mus muse ul us 
Pontin52 mRNA, 
complete cds 



Rattus norvegicus 
homer-lc mRNA, 
complete cds 



0.020 



267449 



frROTCm ZK0J;.2 1N ' 

CHROMOSOME III 
> g i 1 1 02507 |p i r | |S 1 5 7 8 7 
hypothetical protein 1 (cosmid 
ZK637) - Caenorhabditis 
[elegans Genefinder; cDNA EST 
yk217b5.3 comes from this 
gene;cDNA EST yk217b5.5 
comes from this gene; cDNA 
EST yk340gl2.3 comes from 
this gene; cDNA EST 
yk340gl2.5 comes from this 
gene; cDNA EST yk428c5.5 

CO... 



ins) 



P VALUE 



Ie-12 



0.020 



1109847 



(U41538) No definition line 
found [Caenorhabditis elegansl 



0.019 



132836 



Mus musculus 
Puiu i n52 mRNA, 
complete cds 



I60S RIBOSOMAL PROTEIN 
L28 protein L28 [Rattus 
[noryeaicusl 



Ie-22 



5.7 



0.019 



2633401 



(Z99 109) similar to DNA 
exonuclease 



4.5 



0.019 



2492604 



MULTIDRUG RESISTANCE 
PROTEIN CDR2 albicansl 



829 I U67538 



830 



U56088 



831 



U76524 



Methanococcus 
jannaschii section 80 
of 150 of the 
complete genome 



Human periodic 
tryptophan protein 2 
(PWP2) gene, exons 
3 to 1 4 

Sambucus nigra 
ribosome inactivatins 
protein precursor 
mRNA, complete cds 



0.019 



1723566 



[FUlAHVh 

GLUCOSYLTRANSFERASE 
CI7C9.07 

>gi| 1 3 1 4 1 59|gnl|PID|e24 1 760 
(Z73099) SPAC17C9.07, 
putative glucosyl transferase len: 
501, similar to 
SW:ALG8_ YEAST P4035 1 
glucosyltransferase ALG8 
pom be J | j 7 



0.019 



2144804 



collagen alpha l(II) chain 
50 vine 



^.040 



0.018 



1916976 



(U9I682) vitelline membrane 
protein homolog (Aedes 
a egypti] 



7.2 



WO 0 1 /02568 PCT/US00/1 8374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neiehbor (BlastX vs. Non- Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE | ACCESSION 


DESCRIPTION 


P VALUE 
















832 


AF026258 


Onobrychis viciifolia 
chalcone synthase 
(CHS) mRNA, 
complete cds 


0.018 


L 

763076 


(Z48799) ZP3 (Cyprinus carpio] 
>gi|777724 (L41637) egg 
membrane protein [Cyprinus 
carpio] 


5.2 


833 


U95094 


Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA, 
complete cds 


0.009 


3955011 


(AJ005438) beta adrenoceptor 
B 


0.60 




X71603 


L jejuni VSI DNA> 

t m 
f • 

emD|Ajyouj|Ajyouj 
Sequence 2 from 
Patent W094 17205 > 
:: gb|I76090|I76090 
Sequence 2 from 
patent US 5691 138 


0.008 


<NONE> 


<NONE> 


<NONE> 


QIC 


AF093268 


Rattus norvegicus 
homer- lc mRNA, 
complete cds 


0.008 


138116 


HEAD FIBER PROTEIN 
(LATE PROTEIN GP8.5) 
>gi|75846|pir||WMBP8H gene 
8.5 protein - phage PZA 
>gi|2l6057 (Ml 1813) head 
fiber protein 


8.1 


O JO 


X91751 


Bovine herpesvirus 
tvpe 1 UL7 sene 


0.008 


1711436 


SUPEROXIDE DISMUTASE 
(FE) 1.15.1.1) (Fe)- 
Pseudomonas aeruginosa 
>gi|409767 


5.9 


0 J / 


M95594 


Arabidopsis thaliana 
I -aminocyclopropane- 
l-carboxylate 
synthase (ACS2) 
gene, complete cds. 


0.008 


683698 


(Z48229) orfl gene product 
Saccharomvces cerevisiae] 


le-06 


838 


U67465 


Methanococcus 
annaschii section 7 
of 150 of the 
complete genome 


0.008 


3874664 


(Z68493) predicted using 
Genefinder 


le-07 


839 


X72388 


J.taurus mRNA for 
ilensin 


0.008 


100174 


1-aminocyclopropanc- 1- 
carboxylate synthase 


7e-09 


840 


U22398 


-luman Cdk-inhibitor 
p57KIP2 (K1P2) 
mRNA, complete cds. 


O.OOS 


2228750 


(U93868) RNA polymerase III 
subunit [Homo sapiens] 


2e-lS 


841 


L42546 


Xenopus laevis LIM 
iMass homeodomain 
protein 


0.007 


<NONE> 


<NONE> 


<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



842 



843 



844 



845 



846 



847 



848 



Ml 



850 



851 



852 



853 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



AF041428 



AF000227 



D86254 



AFO 12899 



Y07738 



AJ005813 



AF055119 



M61195 



Y 11050 



X61204 



ABO 12 105 



S43SS2 



Homo sapiens 



hbosomal protein s4 
X isoform gene, 
complete cds 



Secale cereale omega 
secalin gene. 
complete cds 



Human MHC (HLA) 
DRB intron 1 DNA, 
partial sequence 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 



M.musculus gene for 
vimentin 



Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzyme 



Homo sapiens alpha- 
tectorin(TECTA) 
gene, exon 6 



Zucchini 1- 
aminocyclopropane- 1 
carfaoxvlate synthase 



Homo sapiens DSG3 
gene, partial intron 
and partial exon 6, 
140 bp 



M.vollae vhuD, 
vhuG, vhuA. vhuU & 
vhuB senes 



Brassica rapa mRNA 
forSLG45, complete 
cds 



telomere: 

(minichromosome, 
repeats } 

[Trypanosoma brucei 
Genomic. 1 170 nt] 



0.007 



0.007 



0.007 



0.007 



0.007 



0.007 



0.007 



0.007 



0.007 



0.007 



0.007 



O.007 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



403 



i 



WO 01/02568 PCT/US00/18374 



*" ■ | Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) | 


SEQ 
ID 




DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


l 












854 


L32674 


Geomydoecus nadleri 
mitochondrial 
cytochrome oxidase I 
gene, partial cds. 


0.007 


<NONE> 


<NONE> 


<NONE> 


855 


U58732 


Caenorhabdilis 
elegans cos mid 
F48D6. 


0.007 


<NONE> 


<NONE> 


<NONE> 


856 

\j S \j 




Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


0.007. 


<NONE> 


<NONE> 


<NONE> 


OJ f 




H.sapiens mRNA for 
MDR3 P- 
glycoprotein 


0.007 


1730696 


HYPOTHETICAL 121.1 KD 
PROTEIN IN BI03-HXT17 
LNTERGENIC REGION 
PRECURSOR VNR067c - yeast 
(Saccharornyces cerevisiae) 


'9.5 


85S 


XI5217 


Human sno oncogene 
mRNA for snoA 
protein, ski-related 


0.007 


902455 


(U24203) membrane protein 
[Escherichia coli] 


8.8 


859 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 


0.007 


1684636 


(Y09454) ORF3 [Lactobacillus 
casei bacteriophage A2] 


8.3 


860 


AFO 12899 


Sambucus nigra 
ribosome inactivatina 
protein precursor 
mRNA, complete cds 


0.007 


3878803 


(Z48795) R05H5.7 
'Caenorhabditis elesans] 


8.3 


861 


S76317 


Tiy=18U-2UO kda 
membrane protein 
scavenger receptor 
lomolog {clone 18, 
tntron and flanking 
exons 14 and 15} 
sheep, lymph node, 
ymphocytes. 
Genomic. 30S nt, 
segment 2 of 2] 


0.007 


294747 


(L08174) ORF2 
Romanomermis culicivoraxl 


7.4 



WO 01/02568 



PCT/US00/18374 




P VALUE 




Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



862 D88084 



IPedicularis 
verticillata 
chloroplast DNA, 
intergenic region 
between tmT(UGU) 

[and trnL(UAA)5'exon 



0.007 



2555187 



863 



Chicken mRNA for 
aldehyde 
X58869 [dehydrogenase 



0.007 



1 15978 



■86T 



865 



Homo sapiens mRNA 
for GS3786, complete| 

D87I2U — rats; 

H. sapiens gene for 
X68793 antithrornbin III 



0.007 



2358285 



866 



Danio rerio mRNA 
for opioid receptor 
AJ00 1 596 homologue 



0.007 



2507509 



Streptomyces albus 
I valine dehydrogenase 
(Vdh) gene, complete 
867 I AF061195 |cds 



Arabidopsis tha liana 
mRNA for 
neoxanlhin cleavage 
[enzyme 



0.007 



208S76S 



(AF026789) vitellogenin 
|[Pimpla nipponica] 



368 I AJ005813 



0.007 



1710105 



ICD30L RECEPTOR 
PRECURSOR 
(LYMPHOCYTE 

IcDNA EST EMBL:D35637 
Icomes from this gene; cDNA 
ESTyk322a3.5 comes from this 
gene; cDNA EST yk397b2.5 
comes from this gene; cDNA 
EST yk348bl 1 .5 comes from 
this gene; cDNA EST 
yk397b2.3 comes fr... 

>gi|3880965|gnl|PID|e!350578 
comes from this gene; cDNA 
EST yk322a3.5 comes from this 
gene; cDNA EST yk397b2.5 
comes from this gene; cDNA 
EST yk348bl 1.5 comes from 
[this gene; cDNA EST 

|(AF010403) ALR [Homo 



[sapiens ] 

lHVP01HkTl CAL29.SKD 



PROTEIN IN HOLB-FTSG 
INTERGENIC REGION 
>gi| 1 787342 (AE000210) orf, 
hypothetical protein 
[Escherichia coli] protein in 
holB 3'region , [Escherichia 
[coli] 



(AF003 145) B04 14.8 gene 
product [Caenorhabditis 
[elcga'ns] 



UDP-N- 

lACETYLGLUCOS AMINE 2- 
EP1M ERASE UDP-N- 
aceiylglueosamine 2-epimerase 
[[Plasmid p\VQ799] 



6.9 



6.5 



3.8 



1.9 



1.9 



1.7 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genfaank) 



SEQ 

ID I ACCESSION 



869 



870 



871 



872 



873 



DESCRIPTION 



L03398 



D63484 



Zebrafish retinoic 



acid receptor alpha 
2.A 



Human mRNA for 
KIAA0150gene, 
partial cds 



P VALUE 



Nearest Neighbo r (Bl astX v s. Non-Redundant Proteins) 
ACCESSION 



DESCRIPTION 



0.007 



M31483 



AF090I15 



AF064029 



Maize glyceraldehyde 
3 -phosphate 
dehydrogenase, 3' 
end. 



Lycopersicon 
esculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSP 17.4) mRNA, 
complete cds 



Helianthus tuberosus 
lectin 1 mRNA, 
complete cds 



0.007 



0.007 



2239219 



19917 



P VALUE 



(Z972I0) hypothetical protein 



(Z14014) Pistil extensin like 
protein, partial CDS only 



543068 



mucin, tracheobronchial - dog 
l>gi|402558 " 



0.007 



249494 1 



0.007 



ALPHA-2B ADRENERGIC 
RECEPTOR adrenoceptor 
[Cavia porcellus] 
>gi|1587159|prfp206293B 
adrenoceptor alpha2B [Cavia 
I porcellus] 



1110587 



(S79410) nuclear localization 
signals Peptide, 140 aa] [Mus 
sp.] 



0.77 



0.61 



0.45 



0.42 



0.26 



874 



875 



876 



877 



XS8931 



S74155 



H.sapiens PAL2A 
aene 



0.007 



zRAR alpha =retinoic 
acid receptor alpha 
[zebrafish, embryos, 
mRNA, 1773 ml 



1706176 



fCUTINASE TRANSCRIPTION 
FACTOR t ALPHA 
>gi| 12629 12 (U51671)cutinase 
transcription factor 1 [Fusarium 
|solani f, sp. pisi] 



M74193 



U03673 



878 D37766 



Petromyzon marinus 
plasma albumin 
mRNA, complete cds 



Saccharomyces 
cerevisiae Spp4lp 
(SPP4l)gene, 
complete cds. 



Homo sapiens mRNA 
for Laminin-5 beta3 
chain, complete cds 



0.007 



0.007 



2239219 (Z97210) hypothetical protein 



- 



730888 



OCTAPEPTIDE- REPEAT 
PROTEIN T2 



0.007 



3820885 



(AL033126) 65G3.k 
[Drosophila melanogaster] 



0.007 



1235974 



(X967 13) collagen [Globodera 
[pallidal 



0.21 



0.11 



0.011 



0.001 



3e-06 



WO 01/02568 



PCT/US00/18374 




Nearest Neighbor (BlastN vs. Genbank) 



879 I AP022388 



880 I U89984 



DESCRIPTI ON 
Caenorhabditis 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins] 



ACCESSION 



DESCRIPTION 



elegans putative 
transcription factor 
MAB-3 (mab-3) 
gene, complete cds 



Acanthamoeba 
castellanii 
transformation- 
sensitive protein 
homoiog mRNA, 
complete cds 



0.007 



3747107 



Homo sapiens mRNA| 
[for KIAA0882 
wotein, partial cds 
|Mus musculus 
|Pontin52 mRNA, 
AF100694 complete cds 



881 I AB020689 



882 



Arabidopsis thai i ana 
cellulose synthase 
catalytic subunit (Ath 
JA) mRNA, complete 
883 I AF027173 cds 



884 



U76524 



Sambucus nigra 
ribosome inactivating 
protein precursor 
[mRNA, complete cds 



0.007 



1890281 



0.007 



0.006 



3880809 



<NONE> 



P VALUE 



(AF095741) unknown [Rattus 
norvegicusl 



(U89984) transformation- 
sensitive protein homolos 



abGAP domains; cDNA EST 
EMBL:D34945 comes from this 
gene; cDNA EST 
EMBL:D273 13 comes from this 
gene; cDNA EST 
EMBL:D34829 comes from this 
gene; cDNA EST 
EMBL:D27312 comes from this 
gene; cDNA ... Probable 
rabGAP domains; cDNA EST 
EMBL.D34945 comes from this 
gene; cDNA EST 
EMBL:D27313 comes from this 
gene; cDNA EST 
EMBL:D34329 co mes from this 
gene; cDNA EST 
EMBL:D27312 comes from this 
gene; cDNA ... 



<NONE> 



0.006 



<NONE> 



0.006 



<NONE> 



<NONE> 



<NONE> 



5e-09 



2e-09 



le-23 



<NONE> 



<NONE> 



\ 

<NON'E> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



885 



886 



887 



P VALUE 



U76524 



AJ005813 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 



Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzyme 



0.006 



AB012I06 



888 1 M80529 



Brassica rapa rnRNA 
forSRK45, complete 
cds 



Rattus norvegicus 
ceruloplasmin gene, 
exon I and 5' flank 



0.006 



0.006 



889 



890 



891 



892 



AF027I73 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
A) mRNA, complete 
cds 



Sambucus nisra lectin 
precursor mRNA, 
U76523 complete cds 



Rattus norvegicus 
homer- Ic mRNA, 
AF093268 complete cds 



0.006 



0.006 



0.006 



Brassica rapa mRNA 
jfor SRK45, complete 
ABO 12 106 cds 



893 



I Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
AJ005S13 jenzyme 



894 



0.006 



0.006 



0.006 



Rattus norveeicus 
homer- 1c mRNA, 
AF093268 complete cds 



Rattus norvegicus 
homer- lc mRNA, 
895 | AF09326S complete cds 



896 



M.musculus mRNA 
X993S4 jfor paladin gene 



0.006 



0.006 



0.003 



- 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 




<NONE> 



<NONE> 



<NONE> 



<NONE> 



99408 



hypothetical protein 6 - 
Chlamydomonas reinhardtii 
transposon 

>gi|13607I7|gnl|PID|e3346I 
reinhardtii] 



4039024 



(AF0391I0) polyprotein 
[[Rubella virus] 



160533 



(M9442S) merozoite surface 
| antigen 1 [Plasmodium vivax] 



4019458 



1916976 



(U91682) vitelline membrane 
protein homolog [Aedes 
aegypti] 



102059 



promastigote surface antigen-2 
(clone 4.6) - Leishmania major 
(fragment) >gi|9583 (X57135) 
surface antigen P2 [Leishmania 
| major] . 



3171241 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



9.6 



9.3 



7.5 



(AF093984) envelope 
glycoprotein [Human 
immunodeficiency virus type I] I 7.0 



6.S 



2.4 



(AF067204) transcription factor 
IBF-1 [Danio rerio] | 1.0 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genhnnl^ 




ACCESSION 



DESCRIPTION 



897 | AFQ27174 



89S I AEOOI148 



P VALUE 



ACCESSION 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 



0.003 



899 I AF027173 



900 I U72396 



901 I AF1 00694 



Borrelia burgdorferi 
(section 34 of 70) of 
the complete aenome 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 



0.003 



0.003 



Lycopersicon 
esculentum class II 
small heat shock 
protein Le-HSP17.6 
mRNA. comple te cds 
vlus musculus 
Pontin52 mRNA, 
complete cds 
Ohlamydomonas 



0.002 



0.002 



902 J AF104631 



903 1 API 00694 



904 | ABO 12 106 



reinhardtii light 



protein precursor 
(Lhcb3) mRNA, 
complete cds 



905 I M2I339 



906 | AF012899 



Mus musculus 
Pontin52 mRNA, 
complete cds 
Brassica rapa mRNA 
for SRK45, complete 

cds 

4uman non-histone 
chromosomal protein 
HMG-14 gene, 
complete cds. 



0.002 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 



0.002 



0.002 



DESCRIPTION 



<NONE> 



0.002 



P VALUE 



4160388 



0.002 



1709213 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(AJ0II856) ORF Q0255 
[Saccharomyces cerevisiael 



NUCLEAR ENVELOPE PORE 
MEMBRANE PROTEIN POM 
121 (PORE MEMBRANE 
PROTEIN OF 121 KD) (P145) 



<NONE> 



7.6 



1.5 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



I Nearest Neighbor f BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



X57103 



Human h-lys gene for 



lysozyme (upstream 



P VALUE 



re gion) 



908 



ISambucus nigra 
jhevein-like protein 
AF074386 mRNA. complete cds 



Human CD4 
I promoter, partial 
U01066 sequ ence. 



'Barley mRNA 
910 | L28094 [sequence. 



911 



Homo sapiens DNA 
from chromosome 19 
cosmic! f 19399 (-17 
kb EcoRI restriction 



AD000833 (fragment) 



Nearest Neighb or (BlastX vs. Non-Redundant Proteins) 1 



ACCESSION 



0.002 



0.002 



0,002 



0.002 



Homo sapiens TRHR 
Jgene promoter and 
912 I I AJ011701 exons 1-2, partial 



913 



Mus m use u I as 
Pontin52 mRNA, 
AF1QQ694 complete cds 



914 AF037062 



915 



Homo sapiens retinol 
dehydrogenase gene, 
complete cds 



Rattus norvegicus 
homer- Ic mRNA, 



AF093268 complete cds 
Methanococc 



916 



U67608 



917 



918 



919 



AF027173 



Z46736 



AB012106 



Methanococc us 
jannaschii section 150 
of 150 of the 
complete genome 



0.002 



0.002 



0.002 



0.002 



_ 



Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath 
A) mRNA, complete 
cds 



0.002 



H.sapiens DNA for 
repeat region (ABM- 
C82) 



Brassica rapa mRNA 
forSRK45, complete 
cds 



0.002 



0.002 



0.002 



i 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



DESCRIPTION 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



P VALUE 




<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



Y/0 



WO 01/02568 PCTYUS00/18374 





Nearest Neighbor (BlastN vs. Cenbank) 


Nearest Neiehbor fBlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






X.Iaevis mRNA for 










920 


Z85983 


NOVA protein 


0.002 


<NONE> 


cNONE> 


<NONE> 


00 1 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 

r\ ) tTLIvlN r\, CUITipiClC 

cds 


0.002 


<NONE> 


<NONE> 


<NONE> 


922 


S61977 


mf>H i iirn-f*nn in scvl- 

CoA dehydrogenase 
{exon 10, intron 10) 
[numan, ucnomic, 
1407 nt] 


0.002 


< * 

<NONE> 


<NONE> 


<NONE> 


923 


AJ0058I3 


Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzyme 


0.002 


<NONE> 


<NONE> 


<NONE> 


924 


AB012I05 


orassica rapa nuviirv 
for SLG45, complete 
cds 


0.002 


<NONE> 


<NONE> 


<NONE> 


925 


ABO 12 106 


crass) ca rapa nu\.i™r\ 
for SRK45, complete 
cds 


0.002 


<NONE> 


<NONE> 


<NONE> 


926 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A; mKrsA, complete 
cds 


0.002 


<NONE> 


<NONE> 


<NONH> 


927 


, X51 646 


H.sapiens DNA for 
dopamine D2 
receptor gene 




lino i 


(AE001337) YopOGen 
Secretion Protein D [Chlamydia 
tracno maris j 




928 


j 

AF 1 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


0.002 


465762 


HVPUlKbllLAL iU.i iOJ 
PROTEIN C06G4. 1 IN 
CHROMOSOME III 
>gi|630524|pir||S44748 
C06G4.1 protein - 
Caenorhabditis elegans 
>gi|409292 (L25598) homology 
with vigilin; coded for by C. 
elegans cDNA 

GenBank:MS8954 (CEL12C9); 
putative [Caenorhabditis 


8.9 


929 


U4847S 


Human skeletal 
muscle ryanodine 
receptor °ene 


0.002 


2137221 


co-repressor protein - mouse 
>si|64:6l9 


6.9 



mi 
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Nearest Neighbor ( BlastN vs. Genbank) 


| Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


1 ACCESSION 


DESCRIPTION 


P VALUE 




Mus musculus 


1 






930 


1 API 00694 


Pontin52 mRNA, 
complete cds 


0.002 


806536 


(Z22520) membrane protein 
[Bacillus acidopullulyticus] 


6.3 


931 


AF100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


0.002 


f — 

3881055 


(AL023844) Y48A6B.1 
[Caenorhabditis elegans] 


5.8 


932 


1 A FOQO 1 I S 


Lycopersicon 
esculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSPL7.4) mRNA, 
complete cds 


o no? 


JO f O J J\J 


(Z81097) K07A1.4 
[Caenorhabditis elegans] 


4.8 


933 


AF093268 


Rattus norvegicus 
homer- lc mRNA, 
complete cds 


0.002 


137640 


REPLICATION PROTEIN El 
papillomavirus 


4.0 


934 


AF019660 


Mus musculus 
nuclear orphan 
receptor RORgamma 


0.002 


1330365 


(U58757) similar to nucleotide 
pyrophosphatases 


3.9 


935 


AFL 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


0.002 


1785972 


(U4695i)ORF5; Method: 
conceptual translation supplied 
by author 


3.7 


936 


V005O8 


Human gene for 
epsilon-elobin. 


0.002 


1333804 


(X56082) protease 
[Ruminococcus flavefaciens] 


3.5 


937 


AB012105 


Brass ica rapa mRNA 
tbrSLG45, complete 
cds 


0.002 


4153876 


(AC005531) similar to mouse 
homeodomain-interacting 
protein kinase 2; similar to 
AF077659 (PID:g3702958) 


3.0 


938 


AJ005813 


Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzvme 


0.002 


1070461 


ornithine earbamoyltransferase 
(EC 2. 1.3. 3) - yeast 
(Saccharomyces cerevisiae) 
>gi|929866 (X83502) 
pid:e 130025 [Saccharomyces 
cerevisiae] >si| 1008256 


2.8 


939 


S41458 


rod cGMP 
shosphodiesterase 
seta-subunit [human, 
mRNA. 3231 nt] 


0.002 1 


3450883 


(AF083334) fibroin [Antheraea 
pernyi] 


1.6 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlasLX vs. Non-Redundant Proteins) 



DroGQphila 



940 



X06286 



941 



AF015812 



942 



X78925 



943 



AF074386 



944 



Z69639 




melanogaster Gart 
locus with genes for 
GARS=phosphoribos 
ylamineglycine 
ligase, 

AIRS=phosphoribosy 
Iformylglycinamidine 
cyclo-ligase, 
G ART=g iy c i na mide 
ribotide 

trans formylase > 
gb|J02527|DROGAR 
T D. melanogaster 
Gart gene encoding 
two polypeptides with 
GAR synthase, AIR 
synthase, and GAR 
trans formylase 
enzyme activities and 
a pupal cuticle gene 
nested within intron 
A of the Gart gene. 



Homo sapiens RNA 
helicase p6S 
(HUMP6S) g ene, 
complete ccfs 



H.sapiens HZF2 
mRNA for zinc finger 
protein 



Sambucus nigra 
hevein-like protein 
mRNA, complete cds 



Human DNA 
sequence from 
cosmidL241B9, 
Huntington's Disease 
Region, chromosome 
4pl6.3 contains 
polymorphic VNTR 
pYNZ32. 



4(2> 



WO 01/02568 



PCT/US00/18374 



1 Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundnnr Pmieino 


SEQ 1 

ID | ACCESSIONS 


f DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


— 1 




« 






945 


AF074387 


Sambucus nigra 
he vein- like protein 
mRNA, complete cds 


0.002 


2984161 


(AE00076 1) hypothetical 
protein [Aquifex aeolicus] 


0.80 


946 


AP093268 


Rattus norvegicus 
homer- 1 c mRNA, 
complete cds 


0.002 


101830 


hypothetical protein B - chestnu 
blight fungus 


0.72 


947 


AF017307 


Homo sapiens Ets- 
rcjjicu transcription 
factor (ERT) mRNA, 
complete cds 


0.002 


200531 


(Ml 8071) prion protein [Mus 
musculus] 


0.72 ' 


948 


U11383 


Drosophila 
melanogaster Ovo- 
1028aa (ovo) mRNA, 
complete cds. 


0.002 


2465207 


(AF0 16045) OVO-like 1 
binding protein [Homo sapiens] 


0.35 I 


949 


AF012899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


0.002 


3834294 


(U80846) No definition line 
found fCaenorhabditis elegans] 


0.29 


950 


AF086315 


riomo sapiens ruli 
ength insert cDNA 
clone ZD52F10 


0.002 


545067 


(S68356) action potential 
broadening potassium 
channel=Shab [Aplysia, bag cell 
neurons, head ganglia. Peptide, 
905 aa] [Aplysia] 
>gi|7431I0|prf]|2011375A K 
channel [Aplysia califcrnica] 


0.15 1 


951 


X53096 < 


S .aureus genes 
encuuing oauyui 
DNA 

methyl transferase and 
Sau96I restriction 
endonuclease 


0.002 


2529575 


(AF018164) kinesin-like protein 
3C [Homo sapiens] 


0.11 


952 


AB0121O5 


^rn<;^ipn rnnn mPMA 

: or SLG45, complete 
:ds 


0.002 


729918 


LA PKUlfcLN HOMOLOG (LA 
RIBONUCLEOPROTEIN) (LA 
AUTOANTIGEN HOMOLOG) 


0.092 


953 


X73973 


j.gallus RAR- 
>amma2 mRNA for 
etinoic acid receptor 


0.002 



* 

1 

586122 { 


TRICHOHYALIN 
>gi|423321|pir||A40691 
trichohyalin - sheep >gi|295941 
;ZlS361)trichohyalin 


0.073 


954 


r 

S41458 


od cGMP 
)hosphodiesterase 
)eta-subunit [human, 
nRNA, 3231 nt] 


0.002 I 


{ 

1017427 ; 


X90569) elastic titin [Homo 
iapicns] 


0.013 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



955 I M35887 



956 I AF034099 



957 I AF033929 



DESCRIPTIO N 
D.melanogaster 
defective chorion- 1 
fcl25 (dec- 1) gene, 
complete cds. 



P VALUE 



0.002 



958 I AB0I2106 



959 AF029062 



960 U70671 



Lace aria bicolor 
glyoxal malate 
synthase protein 

mRNA. complete cd s | 0.002 
Bactrocera dorsal is 
strain Tahiti 
mitochondrial D-loop 
region, complete 
sequence | 9e-04 



Brassica rapa mRNA 
for SRK45, complete 
cds I 8e-04 



Homo sapiens DEAD* 
box protein (BATI) 
gene, partial cds | 8e-04 



962 I X14077 



963 I AC004497 



964 I AF077344 



965 I X85II7 



966 J AF 100694 



Human ataxin-2 
related protein 
mRJMA, partial cds 



8e-04 



Dendrocopos 
leucopterus clone 2 
microsatellite HrU2 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins! 



Pea phy gene for 
phytochrome 

apoprotein [_ 8e-04 

Homo sapiens 
chromosome 21, PI 
clone LBNL#6 | 8e-04 
Homo sapiens 
cartilage-derived C- 

e lectin 8e-04 



A. sapiens epb72 gene 
exons 2.3,4 ,5 ,6.7 



1825606 



vlus musculus 
ontin52 mRNA, 
complete cds 



8e-04 



1825593 



<NONE> 



<NONE> 



<NONE> 



P VALUE 



(U88169) similar to 
molybdoterin biosynthesis 
MOEB proteins [Caenorhabditisl 
elegans] | 0.008 



(U88167) D2092.2 gene product 
(Caenorhabditis elegans] | le-06 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



457146 



3702123 



8e-04 25)0059 



1345S59 



<NONE> 

(L27838) rhoptry protein 
[Plasmodium yoelii] 

(AJ011707) TraD protein 

[Escherichia coli] 

(AJ004687) N-4 cytosine- 
specificmMethy (transferase 
[Neisseria gonorrhoeae! 



<NONE> 



9.6 



8.5 



6.8 



COPPER TRANSPORT 
PROTEIN CTR 1 transport 
protein - yeast (Saccharomyces 
cerevisiae) gene product 
Saccharomyces cerevisiae] 



6.7 



4(5 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



967 J AF031403 



968 



969 



L29252 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



P VALUE 



ACCESSION 



Homo sapiens 



DESCRIPTION 



MLL/AF4 
translocation 
breakpoint 
l(4;l l)(q21;23) 



Human (clone DI3-2)| 
L-iditoJ-2 

dehydrogenase gene, 
exon 4, exon 5, exon 
6 and exon 7. 



X16995 



970 



M994I2 



Mouse N10 gene for 
a nuclear hormonal 
binding receptor 



Human interleukin-8 
receptor (H.8RB) 
gene, complete cds 



971 



U37452 



972 



97 



API 00694 



AF093268 



-luman Down 
Syndrome region of 
chromosome 21 
genomic sequence, 
clone A3ID6-IC5. 



Mus musculus 
Pontin52 mRNA, 
complete cds 



^attus norvegicus 
homer- Ic mRNA, 
complete cds 



8e-04 



2498926 



8e-04 



8e-04 



1488070 



1493833 



8e-04 



1346101 



8e-04 



4164069 



8e-04 



8e-04 



1352877 



1788557 



SMALL PROTEIN B 
HOMOLOG A43259. from E. 
hirae [Mycoplasma 
pneumoniae! 



(U63997) putative transposase 
[Enterococcus faecium] 



(U47323) stromal cell protein 
[Mus musculus] 



4-AMINOBUTYRATE 
AMINOTRANSFERASE 
TRANSAMINASE) (GABA 
AMINOTRANSFERASE) 
lomolog - smut fungus 
(Ustilago maydis) >gi|881562 
Emericella nidulans gamma- 
amino-n-butyrate transaminase 
Swiss-Prot Accession Number 
P14010 [Ustilago maydis] 



(AFl 1 1093) latrophilin 3 splice 
variant bbah [Bos taurus] 



H^UIHbllLAL 13.0 KD 
PROTEIN IN RAD26-GEF1 
INTERGENIC REGION 
>gi|1077881[pir[|S57057 ■ 
probable membrane protein 
YJR038c - yeast 
(Saccharomyces cerevisiae) 
>gi| 10 15688 (749538) ORF 
YJR038c putative 
Saccharomyces cerevisiae] 



(AE0003I2) orf, hypothetical 
>rotein [Escherichia coli] 



P VALUE 



6.6 



5.2 



3.2 



0.83 



0.26 



0.2: 



0.19 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















974 


X83872 


• 

H.vulgaris mRNA for 
cAMP response 
element binding 
protein 


8e-04 


1 175386 


HYPUTHkl'lCALJV./lCD 
PROTEIN C18B 11.06 IN 
CHROMOSOME I 
>gi|2130289|pir||S58305 
hypothetical protein 
SPAC18B1 1.06 - fission yeast 
hypothetical protein 
[Schizosaccharomyces pombe] 


0.005 


975 


M32514 


Rat simple sequence 
DNA, cloneS. 


8e-04 


2394492 


(AF024502) No definition line 
found [Caenorhabditis elegans] 


0.002 


976 


AF074386 


Sambucus nigra 
hevein-like protein 
mRNA, complete cds 


8e-04 


2981631 


(AB012223) ORF2 [Canis 
familiaris] 


0,001 


Q77 


X89211 


H.sapiens DNA for 
endogenous retroviral 
like element 


8e-04 


2065210 


(Y 127 13) Pro-Pol-dUTPase 
polyprotein 


3e-04 




U14391 


Human mvosin-IC 
mRNA, complete cds. 


8e-04 


3142302 


(AC00241 1) Strong similarity to 
myosin heavy chain gb|Z34293 
from A. thaliana. [Arabidopsis 
thaliana] 


4e-16 


070 


L13612 


Drosophila 
melanogaster dead- 
box protein 

D. melanogaster 

DEAD-box gene, 
complete CDS 


8e-04 


3776027 


(AJ010475) RNA helicase 
[Arabidopsis thaliana] 


9e-24 


980 


AF074386 


Sambucus nigra 
hevein-like protein 
mRNA, complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


981 


AF 100694 


VI us muse ul us 
rOntinDi mKNA, 
complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


982 


AF093268 


^attus norvegicus 
lomer- lc mRNA, 
complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


983 


2739S7 


Human DNA 
sequence from 
cosmid N120B6 on 
chromosome 22 
Contains ESTs, 
complete sequence 
Homo sapiens] 


7e-04 


<NONE> 


<NONE> 


<NONE> 



f'7 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Brassica rapa mRNA 










984 


AB012106 


for SRK45, complete 

cdis 


7e-04 


<NONE> 


<NONE> 


<NONE> 1 


985 


AF093268 


Rattus norvegicus 
homer- 1c mRNA, 
complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> I 


986 


AF027174 


Arab i dops is thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 


7e-04 


' ' <NONE> 


<NONE> 


<NONE> 


987 


AF 100694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


700 


AJO05813 


Arabidopsis thaliana 
mRNA for 
neoxantnin cleavage 
enzyme 


7e-04 


<NONE> 


<NONE> 


<NONE> 


989 


AF064029 


Helianthus tuberosus 
lectin 1 mRNA, 
complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 




AF027174 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


991 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


992 


AF064029 


Helianthus tuberosus 
ectin 1 mRNA, 
complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


993 


AF 100694 


vlus musculus 
Pontin52 mRNA, 
complete cds 


7e-04 


<NONE> 


<NONE> 


<NONE> 


994 


U76524 1 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


7e-04 1 


3327230 


(ABO 14608) KIAA0708 protein 
Homo sapiens] 


9.5 



WO 01/02568 
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Nearest Neishbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEO 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















one 


U76524 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


7e-04 


3327230 


(ABO 14608) KIAA0708 protein 
[Homo sapiensl 


9.3 




AF074387 


Sambucus nigra 
hevein-like protein 
mRNA. complete cds 


7e-04 


3876455 


(Z93380) predicted using 
Genefinder; similar to 7tm 
receptor protein [Caenorhabditis 
elegans] 


7.1 


997 


U76524 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


7e-04 


2128771 


hypothetical protein MJ1293 - 
Methanococcus jannaschii 
>gi|1591931 (U67570JM. 
jannaschii predicted coding 
region M J 1293 [Methanococcus 
jannaschii] 


6.2 


998 


U09412 


Human zinc finger 
protein ZNF134 
mRNA, complete cds 


7e-04 


1083336 


glutathione transferase (EC 
2.5.1.18) piA - mouse 


5.4 


999 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 


7e-04 


473515 


(M17619) NADH 
dehydrogenase subunit ND4 
[Asterina pectinifera] 


3.7 


1000 


AF012899 


Sambucus nigra 
ribosome inactivating 
Drotein precursor 
mRNA, complete cds 


7e-04 


1 724097 


(U79772) female sex protein 
[Mercurial is annua] 


3.3 


100 1 


AF100694 


VI us musculus 
Pontin52 mRNA, 
complete cds 


. 7e-04 


1197103 


(D49747) core, en v, and part of 
E2/NS1 


3.2 


1002 


■ 

XI 6995 


vIouseNIO aene for 
a nuclear hormonal 
binding receptor 


7e-04 


345372 


unco protein, long torm - 
Caenorhabditis elegans 
>gi|258529|bbs|l 18648 
(S47168)UNC- 
5=immuno2lobulin and 
thrombospondin type 1 
transmembrane protein 
alternatively spliced} aa] 
Caenorhabditis elegans] 
>gi|2662596 (AF03669S) C. 
elegans UNC-5 (NID:g25852) 





\ 
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Nearest Neighbor (BlastN vs. Genbank) 1 Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 



SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


| ACCESSION 


DESCRIPTION 


P VALUE 


I 




1 






1003 


U76524 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


7e-04 


4204220 


(AB022866) mobilization 
protein 




2.5 


1004 


AF093268 


Rattus norvegicus 
homer- lc mRNA, 
complete cds 


7e-04 


3201550 


(Y 1 71 16) fibrinogen-binding 
protein 


2.4 


1005 


AF074386 


Sambucus nigra 
hevein-Iike protein 
mRNA. complete cds 


7e-04 


■ 1174264 


(U45966) polyprotein [Hepatitis 
G virus] 


0.73 


1006 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 


7e-04 


135308 


TRANSCRIPTION FACTOR 
JUN-D 


0.065 


1007 


X98745 


H. sapiens EWS gene, 
intron 6, 
polymorphism 


7e-04 


728836 


!!!! ALU SUBFAMILY SP 
WARNING ENTRY 


0.001 


1008 


AJ005813 


Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
en2vme 


7e-04 


1633564 


(U47924) C8 [Homo sapiens] 


9e-09 


1009 


AF074386 


Sambucus nigra 
hevein-like protein 
mRNA, complete cds 


6e-04 


284171 


Ig epsilon chain C region form 3 
- human 


1.3 


1010 


AB012106 


Brassica rapa mRNA 
for SRK45 T complete 
cds 


6e-04 


3845262 


(AE0014 14) BRAHMA 
ortholog (DNA helicase 
superfamily II) 


0.25 


1011 


AL034404 


Human DNA 
sequence from clone 
417C12 on 

chromosome Xp22. 1 1 
22.2, complete 
sequence [Homo 
sapiens] 


3e-04 


<NONE> 


<NONE> 


<NONE> 


1012 


M9970 1 


Homo sapiens (pp21) 
mRNA, complete cds. 


3e-04 


<NONE> 


<NONE> 


<NONE> 
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Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1013.1 U00227 



1014 AF074387 



10151 U95102 



10161 AB012106 



10171 AJ010737 



P VALUE 



vis aries Merino 
breed DR beta-chain 
antigen binding 
domain. MHC class I 
DRB (Ovar-DRB24) 
gene, partial cds. 



Sambucus nigra 
hevein-like protein 
mRNA. complete cds 



Xenopus laevis 
mitotic 

phosphoprotein 90 

mRNA, complete cds 
Brass ica rapa mRNA 
: or SRK45, complete 
cds 

Vlus musculus DNA 
or micro sate I Ike 3kb 
upstream I bp gene 



3e-04 



3e-04 



3e-04 



3e-04 



3e-04 



1018 AF053137 



10191 AF027173 



4omo sapiens histone 
deacetylase 3 gene, 
exons 4. 5. 6, 7 T 8. 9 . 
and 10 



Arabidopsis thai i ana 
cellulose synthase 
catalytic subunit (Ath 
A) mRNA, complete 
cds 



1020 AC004173 



10211 X57025 



1022 X77090 



Homo sapiens clone 
UWGC:y23x011 
from 6p2 1, complete 
sequence [Homo 
sapiens] 



Human IGF-I mRNA 
for insulin-like 



srowth factor I 



H.sapiens IL-IRa 



gene. 



3e-04 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



<NONE> 



<NONE> 



999418 



2367460 



4106549 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



(L 19655) ORF [Tomato 

ringspot virus] 

(AF01 1415) putative 
pheromone receptor [Mus 
musculus] 



(AF10441 1) neuronal-specific 
septin 3 [Mus musculus] 



NADH-bEpENMNt fLa vtFJ 

OXIDOREDUCTASE acid- 
nducible - Eubacterium sp 
>gi|1381570 (U57489) 
S'ADH: flavin oxidoreductase 



3e-04 



1785789 



3e-04 



55,8521 



3e-04 



4206707 



3e-04 



1065941 



(Y08502) orfllld [Arabidopsis 
thaliana] 



(D28917) poiyprotein [Hepatitis 
C virus] 



(AFI 18122) putative outer 
membrane protein OmpU 



(U40799) F42C5.7 gene product 
[Caenorhabdiiis elegans] 



7.0 



5.5 



5.1 



0.65 



0.12 



WO 01/02568 
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Nearest Neighbor (BlasiN vs. Genbank) 


I Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


I ACCESSION 


DESCRIPTION 


P VALUE 







Pseudorabies virus 










1023 


M3465 1 


with upstream and 

downsteam 

sequences. 


3e-04 


2746853 


(AF040650) contains similarity 
to sodium- potassium-chloride 
cotransport proteins 


7e-05 


1024 


Z36011 


S.cerevisiae 
chromosome II 
reading frame ORF 
YBR142w 


3e-04 


1 2500537 


PUOUAlJLE ATP- 
DEPENDENT RNA 
HELICASE HAS I 
>gi|626265|pir||S47451 
hypothetical protein YMR290c 
RNA helicase [Saccharomyces 
cerevisiae] 


4e-08 . 


1025 


AF020286 


Dictyostelium 
discoideum 2034 
gene, partial cds 


3e-04 


1465834 


(U64857) No definition line 
found [Caenorhabditis elegans] 


6e-l4 


1026 


L26049 


Chlamydomonas 
reinhardtii dynein 
heavy chain alpha 
(ODA1 1) gene, exons 
2-15, and partial cds. 


3e-04 


3876775 


(Z81077) predicted using 
Genefinder; Similarity to Yeast 
protein 8248 (TR:G587531) 


9e-15 


1027 


AF020286 


Dictyostelium 
discoideum 2034 
gene, partial cds 


3e-04 


1465834 


(U64857) No definition line 
found [Caenorhabditis elegansl 


le-17 


t r\i o 
LKJlo 


X79811 


S.cerevisiae ACT3 
gene 


3e-04 


3876090 


(^.oyojij Similarity to Yeast 
uridine kinase 

(SW;URK1_YEAST); cDNA 
EST EMBL:Z 14695 comes 
from this gene; cDNA EST 
CEMSE17F comes from this 
gene; cDNA EST 
EMBL:D67355 comes from this 
gene; cDNA EST yk209h 1 .5 
comes from this ge... 


7e-3 1 


1029 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit ( Ath- 
A) mRNA, complete 
cds 


2e-04 


<NONE> 


<NONE> 


<NONE> 


1030 


M22970 


Human pancreatic 
phospholipase A-2 
(PLA-2) gene, exons 
1 to 3. 


2e-04 


<NONE> 


<NONE> 


<NONE> 1 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlasiN vs. Genbank) 


Nearest Neichbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1031 


Z68686 


sequence from 
cosmid N2E9 on 
chromosome 22. 
Contains EST, 
complete sequence 
[Homo sapiens) 


2e-04 


<NONE> 


<NONE> 


<NONE> 


1032 


X95154 


H. sapiens brca2 gene 
exon 4 > :: 

emb|A62779|A62779 
Sequence 20 from 
Patent WO9719110 


2e-04 


" <NONE> 


<NONE> 


<NONE> 


1033 


AJ005813 


Arabidopsis thaliana 
mRNA for 
neoxanthin cleavage 
enzvme 


2e-04 


<NONE> 


<NONE> 


<NONE> 


1034 


AF 100694 


Mus museums 
Pontin52 mRNA, 
complete cds 


2e-04 


<NONE> 


<NONE> 


<NONE> 


1035 


AE001415 


Plasmodium 
falciparum 
chromosome 2, 
section 52 of 73 of 
the complete 
sequence 


2e-04 


<NONE> 


<NONE> 


<NONE> 






Lycopersicon 
esculentum cytosolic 


2e-04 


<NONE> 


<NONE> 


<NONE> 


1036 


AF090115 


class n small heat 
shock protein HCT2 
(HSP17.4) mRNA, 
complete cds 


1037 


AC000958 


no mo bdptens 
[subclone 6_d9 from 
P1H21)DNA 
sequence 


2e-04 


<NONE> 


<NONE> 


<NONE> 


1038 


AF093268 


Rattus norvegicus 
lomer-lc mRNA, 
complete cds 


2e-04 


2501523 


CD59 GLYCOPROTEIN 
PRECURSOR 


7.1 


1039 


U76524 


Sambucus nisra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


2e-04 


2765360 


(Y 13925) cathepsin L2 [Penaeus 
vannamei] 


6.S 



WO 01/02568 



PCT/US00/18374 



J Nearest Neighbor (BlustN vs. Gcnbank) 


Nearest Neiahbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












RNA POLYMEkASE 




1040 


AF027 1 74 


Arabidopsis tha liana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA. complete 


2e-04 


133636 


>gi|67126|pir||RRXPLC RNA- 
directed RNA Dolvmerase (E.C 
2.7.7.48) - lymphocytic 
choriomeningitis virus (strain 

AnrrKtrnno S3M :>oil*n I ~\fft 




1041 


ABO P 106 


Brassica rapa mRNA 
forSRK45, complete 

ViUO 


2e-04 


3822155 


(AF074613) type II secretion 
protein [Escherichia coli 
0157-H71 

\mf * -J 1 -A 1 ' J 


4 0 


1042 


U76524 


Sambucus nigra 
ribosome inactivating 

nrntpin nrpnirsnr 

LSI \J IXt 111 UltWUI J KJl 

mRNA, complete cds 


2e-04 


1718125 


RFfillLATORY PROTEIN E2 
>gi| 1020222 type 36] 


0.38 I 


1043 


XI 7058 


Sus scrofa mRNA for 

pliipfic/* trnncnnrtf* 

protein 


2e-04 


3341906 


(AB009593) xylose transporter 


2e-15 


1044 


AF0082 1 6 


Homo sapiens 
candidate tumor 

cimorpccnr nn^vrl 


le-04 


<NONE> 




<NON"F> 


1045 


X98890 


S. tuberosum mRNA 

for inoreanic 
phosphate 
transporter, StPTl 


le-04 | 624126 


(U42580) a65L [Paramecium 
bursaria Chlorella virus 1] 


7.9 


1046 


L 14930 


VJlV^lll^, IllUA ^IXuv / Lsy 

mRNA, complete cds. 


9e-05 


<NONE> 


<NONE> 


<NONE> 


1047 


AJ009970 


Mus musculus 
thromboxane A2 
receptor gene, exon 3, 
partial 


9e-05 


<NONE> 


<NONE> 


<NONE> 


1048 


Y11896 


M.musculus mRNA 
for Brx gene, partial 


9e-05 1 <NONE> 


<NONE> 


<NONE> I 


1049 


L10832 


talistes annularis 
[clone pan48AAT) 
andem reneat region 


9e-05 


<NONE> 


<NONE> 


<NONE> 


1050 


AF0550H 


^omo sapiens clone 
24587 mRNA 
sequence 


9e-05 


3SS0586 


(Z/y/b8)cUNAhSl 
EMBL:D28009 comes from this 
gene; cDNA EST 
EMBL:D28008 comes from this 
gene; cDNA EST 
EMBL:D32478 comes from this 
>ene; cDNA EST 
EMBL:D34508 comes from this 
gene; cDNA EST 
EMBL:D37581 comes from this 
gene; ... 

£ _.. — - — ii ■■ 1 


7.6 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 














105 1 


U76524 


Sambucus nigra 
ri bo so me inactivating 
protein precursor 
mRNA, complete cds 




RHODOPSIN >gi|22907 17 
(AF000947) rhodopsin [Sepia 
officinalis] 


6.7 


1052 


Z58294 


H.sapiens CpG DNA, 
clone 34d6. forward 
read cpg34d6.ftla . 


9e-05 1 3885496 


(AF064825) heparin/heparan 
sulfate N-acetylglucosaminyl N- 
deacetylase/N-sulfotransferase 
[Bos taums] 


0.65 


1053 


D8745 1 


Human mRNA for 
KIAA0262 gene, 
complete cds 


9e-05 1 " 3874739 


(266495) similar to claustrin 
like 


0.004 


1054 


L37092 


Mus musculus cyclin- 
dependem kinase 
homologue 


Op fK I "inon^ i i 


(AL022598) hypothetical 
protein 


4e-09 


1055 


AF074386 


Sambucus nigra 
he vein- like protein 
mRNA, complete cds 


oc-uj | <i>^jfNt> 


<NONE> 


<NONE> 


1056 


AF027174 


Arabidopsis tha liana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 


8e-05 <NONE> 


<NONE> 


<NONE> 


1057 


AF074386 


Sambucus nigra 
he vein- like protein 
mRNA, complete cds 


Oa Hs 1 >MHM L'C" 

oe-VJJ | <iNUlNt> 




<NONE> 


1058 


D10102 


Homo sapiens DNA 
from cosmid 
clone: 844, GT repeat 
sequence 


8e-05 1 <NONE> 


<NONE> 


<NONE> 


1059 


* 

U72396 


-ycopersicon 
esculentum class II 
small heat shock 
protein Le-HSP 17.6 
mRNA, complete cds 


8e-05 1176475 


HVFUTHLT1CALKU.4KJJ> 
PROTEIN IN SMC3-MRPL8 
INTERGENIC REGION 
>gi|l078237|pir||S56849 
3robable membrane protein 
YJL073w - yeast 
[Saccharomyces cerevisiae) 
>gi|895898 (X8885i) 
lypothetical protein YJL073w 
Saccharomyces cerevisiae] 


6.0 


1060 


X71934 


H.sapiens XB gene . 
for tenascin-X, repeat 
XIII 


1 

8e-05 1 285207 


microtubule-associated protein, 
I10K tau-rat>gi|207!58 
(M84156) bigtau [Rattus- 
norvegicus] 


\ 

3.7 * 



*fl5 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1061 


1 AF027174 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA. complete 
cds 


8e-05 


4049682 


(AF063866) ORF MSV092 
hypothetical protein 
[Melanoplus sanguinipes 
entomopoxvirus] 


2.1 


1062 


AF090115 


Lycopersicon 
esculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSP17.4) mRNA, 
complete cds 


8e-05 


3861019 


(AJ235271) unknown 
[Rickettsia prowazekii] 


5e-14 


1063 


AF027174 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
B) mRNA, complete 
cds 


7e-05 


<NONE> 


<NONE> 


<NONE> 


1064 


L04193 


Human lens 
membrane protein 
(mpl9) gene, exon 
11. 


7e-05 


<NONE> 


<NONE> 


<NONE> 


1065 


X61609 


B.napus gene for 
LHC II Type III 
chlorophyll a/b 
binding protein 


7e-05 


2132314 


hypothetical protein YPR174c - 
yeast similarity to a nuclear 
lamin from C. elegans (PIR 
accession number S42257) 
[Saccharomyces cerevisiae] 


8.9 


1066 


AF064029 


Helianthus tuberosus 
lectin I mRNA* 
complete cds 


7e-05 


2979422 


(AB006757) PCDH7 (BH- 
Pcdh)c [Homo sapiens] 


5.7 


1067 


AF027173 


Arabidopsis thaliana 
cellulose synthase 
catalytic subunit (Ath- 
A) mRNA, complete 
cds 


7e-05 


2493696 


HYPOTHETICAL 2 1 .5 KD 
PROTEIN (ORF 185) 
>gi| 1480440 (U34204) 
ORF1S5; hypothetical 21.4 kD 
protein [Brassica olerucea] 


5.2 


1068 


AF093268 


Rattus norvegicus 
homer- lc mRNA, 
complete cds 


7e-05 1 


2501029 


PR0&A6L£ LEUcYL-TRNA 
SYNTHETASE. 
MITOCHONDRIAL 
PRECURSOR (LEUCINE-- 
TRNA LIGASE) (LEURS) 
KIAA0028 [Homo sapiens] 


1.4 



WO 01/02568 



PCT7US00/18374 





Nearest Neiehbor (BlastN vs. C 


ienbank) 


Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Human DNA 










1069 


Z68758 


sequence from 
cosmid cN85E10on 
chromosome 22ql 1.2- 
qter 


3e-05 


<NONE> 


<NONE> 


<NONE> 


1070 


X60653 


human Histone H3.3 
pseudogene (CIR- 
456) 


3e-05 


.<NONE> 


<NONE> 


<NONE> 


1071 


Z58294 


H.sapiens CpG DNA, 
clone 34d6, forward 
read cpg34d6.ftla . 


3e-05 


' * 1706241 


GUANYLYL CYCLASE GC-E 
PRECURSOR cyclase receptor 
[Mus musculus] 


9.6 


1072 


AF04325 1 


Homo sapiens 
mitochondrial outer 
membrane protein 
(Tom40) gene, 
nuclear gene 
encoding 
mitochondrial 
protein, exons 1 
through 6 


3e-05 


113980 


AMINE OXIDASE [FLAVIN- 
CONTAINING] B oxidase 
(flavin-containing) (EC 1.4.3.4) 
B - human B [human, platelet, 
Peptide Partial, 520 aa] [Homo 
sapiens] 


8.9 


1073 


M31104 


Chicken progesterone 
receptor gene, 
encoding forms A and 
B. exons 1 and 2. 


3e-05 


• 

1170841 


IG GAMMA LAMBDA 
CHAIN V-II REGION 


4.S 


1074 


AFO 12899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


3e-05 


543684 


ribosomal protein S3 - 
Chlamydomonas humicola 
chloroplast (fragment) 


4.2 


1075 


L22206 


-luman vasopressin 
receptor V2 gene,, 
complete cds. 


3e-05 


791207 


(U20615) Gnotl homeodomain 
protein [Gallus gallus] 


1.8 


1076 


AF093268 


Rattus norvegicus 
homer- 1c mRNA, 
complete cds 


3e-05 


3237340 


(AF033361) polyprotein 
[Hepatitis C virus] 


0.94 


1077 


AF 100694 


VI us musculus 
Pontin52 mRNA, 
complete cds 


3e-05 


2879805 


(AL0218I3) hypothetical 
protein 


0.001 


1078 


AF100694 


vlus musculus 
Pontin52 mRNA, 

• 

complete cds 


3e-05 


3877951 


(ZS1555) predicted using 
Genefinder 


3e-07 



^7 



r 
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Nearest Neighbor (BlusiN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 1 


SEQ 
ID 




L/COV_txLi 1 


P V A I T rp 




nF^CRTPTION 


P vat Trc! 
















1079 


AF090U5 


Lycopersicon 
esculentum cytosolic 
class II small heat 
shock protein HCT2 
(HSP17.4) mRNA, 
complete cds 


2e-05 


<NONE> 


<NONc> 


<NONE> 


1080 


AF064029 


Helianthus tuberosus 
lectin 1 mRNA, 
complete cds 


2e-05 


3880197 


(281 132) predicted using 
Genefinder 


2.4 


1081 


AF087989 


Homo sapiens full 
length insert cDNA 
clone YX29D10 


2e-05 


113667 


!!!! ALU CLASS B WARNING 
ENTRY !!!! 


1.8 • 


1082 


AF064029 


Helianthus tuberosus 
lectin I rnRNA, 
complete cds 


2e-05 


474896 


(L31967) mating type protein 
[Coprinus cinereus] 




1083 


AF064029 


Helianthus tuberosus 
lectin 1 mRNA, 
complete cds 


2e-05 


2266988 


(Y 13274) M33 polycomb-like 
protein [Mus musculus! 


0.62 I 


1084 


U67415 


Equus caballus UCD- 
E-CA-467 
dinucleotide repeat 
region, complete 
sequence 


lc-05 


<NONE> 


<NONE> 


<NONE> 


— — 

1085 


X67277 


H.sapiens BGP gene 
for biliary 
glycoprotein, 
promoter region and 
exon 1 


le-05 


<NONE> 


<NONE> 


<NONE> 


1086 


X85117 


H.sapiens epb72 gene 
exons 2,3.4,5,6,7 


le-05 


<NONE> 


<NONE> 


<NONE> 


1087 


U88328 


Mus musculus 
suppressor of 
cytokine sienalling-3 


le-05 


443877 


I ) core region, 
pid:g443877 [Hepatitis C virusl 
virus] 


3.9 


1088 


Y12853 


Homo sapiens P2X7 
eene. exon 4-8 


le-05 


3878726 


(Z66498) similar to cuticle 
collagen; cDNA EST 
EMBL:D75584 comes from this 
sene 


0.36 


1089| 


AE001 140 


Borrelia burgdorferi 
(section 26 of 70) of 
the complete senome 


le-05 


3860719 


(AJ235270) GLUTAMYL- 
tRNA AMIDOTRANSFERASE 
SUBUNIT A (gat A) [Rickettsia 
prowazckii] 


4e-15 



i 



i 
i 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1090 


AJ224112 


Homo sapiens gamma 
adapiin gene, exon 2 
dnu iianKing mironic 
sequences 


9e-06 


<NONE> 


<NONE> 


<NONE> 


1091 


AB000565 


Homo sapiens DNA 
for repeat sequence 
Alu . 


9e-06 


72879 


translation initiation factor IF- 2 - 
Escherichia coli 


5.1 


1092 


Z78985 


chromosome 6 
HindlH fragment, 
SC6pA20B4 


9e-06 


159975 


(M65164)51C surface protein 
[Paramecium tetraurelia] 


4.8 


1093 


Z21677 


Thermotoga maritima 
DNA for spc operon 


9e-06 


585879 


50S RIBOSOMAL PROTEIN 
L2 maritima >gi|437926 
(Z21677) ribosomal protein L2 


7e-14 


1094 


a cm 1 A QA 


Drosophila hydei 
Dhc7 (Threads) 
mKJN a, complete cas 


ye-Uo 


729377 


U I iNXiirN DCl A L.riAIiN, 

CILIARY sea urchin 
(Anthocidaris crassispina) chain 
[Anthocidaris crassispina] 


4e-18 


1095 


AF051315 


Homo sapiens 
placental protein 

1 /a J \rr 1 / J ITlKiN A, 

complete cds 


4e-06 


<NONE> 


<NONE> 


<NONE> 


1096 




Homo sapiens 
(subclone 2_f4 from 
BACH 107) DNA 
sequence 


4e-Uo 


2648j04 


(AE000952) ISA 12 14-6. 
putative transposase 


6.2 


1097 


X85030 


H.sapicns mRNA for 
skeletal muscle- 
specific calpain 


4e-06 


4239857 


(AB016726) calpain 
Schistosoma japonicum] 


0.006 


1098 


M75162 


lLiman poiymorpnic 
arylamine N- 
acetyl transferase 


3e-06 


<NONE> 


<NONE> 


<NONE> 


1099 


AB009999 


*attus norvegicus 
mRNA for COP- 
diacvl a lvcerol 
synthase, complete 
cds 


3e-06 


3879045 


(Z70309)R 102.6 
Caenorhabditis eleeans] 


7.3 


1100 


Z7S9S5 


H .sapiens flow-sorted 
chromosome 6 
-JindW fragment, 
SC6pA20B4 


3e-06 


266529 


MERCURIC REDUCTASE 
(HG(II) REDUCTASE) 
>gi|418744|pir||S3016S 
mercury(II) reductase 


6.5 


1 101 


AB012190 


Homo sapiens mRNA 
for NeddS-activating 
enzyme hUbu3 T 
complete cds 


3e-06 


3877938 


(Z79697) F58HIO.I 
Caenorhabditis eleeans) 


6.3 



WO 01/02568 



PCT/US00/18374 



I Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 


seqJ 

id | accession 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 


I 


Homo sapiens 










1102 


AF041056 


WSCR4 sene, exons 
3 and 4 


3e-06 


1568583 


(Z80775) hypothetical protein 
Rv0044c 


1.9 


1 103 


X00777 


Mouse E(d) beta gene 
5' flanking region and 
exon 1 


3e-06 


1680722 


(U72497) fatty acid amide 
hydrolase [Rattus norvegicus] 


0.008 


1104 


D21205 


Human mRNA tor 
estrogen responsive 
finger protein, 
complete cds 


3e-06 


563127 


(U09825) acid finger protein 
[Homo sapiens] 


le-05 


1 105 1 Z47046 


Human cosmid 
QLL2C9 from Xq28 


le-06 


' <NONE> 


<NONE> 


<NONE> 


1106 


L26261 


Human MHC class III 
HLA-RP1 gene. 


le-06 


<NONE> 


<NONE> 


<NONE> 


1107 


Ml 3402 


Rat 5S RNA gene, 
clone 5S-2. 


le-06 | <NONE> 


<NONE> 


<NONE> 


1108 


X68793 


H.sapiens gene for 
antithrombin III 


le-06 


<NONE> 


<NONE> 


<NONE> 


1109 


AF003540 


Homo sapiens 
Krueppel family zinc 
finger protein 


le-06 | 2507553 


ZINC FINGER PROTEIN 33A 
(ZINC FINGER PROTEIN 
KOX3i)(KIAA0065) 
(HA0946) Kruppel-related. 
[Homo sapiens] 


0.098 


1110 


L42096 


Homo sapiens 
(subclone 10_d2 from 
PI H21) DNA 
sequence. 


le-06 


1330401 


(U58762) T27F7.1 gene product 
[Caenorhabditis elegans) 


0.015 


1111 


Z69925 


Human UNA 
sequence from 
cosmid cNI 16A5, 
between markers 
D22S280 and 
D22S86 on . 
chromosome 22ql2 
contains EST 


9e-07 


<NONE> 


<NONE> 


<NONE> 


IH2I 


D90217 


S. cerevisiae gene for 
YmL33, 
mitochondrial 
ribosomal proteins of 
large subunit 


9e-07 1 


1 

3879097 


(Z81 109) predicted using 
Genefinder; similar to 
sodium/phosphate transporter; 
cDNA EST yk326f6.3 comes 
from this gene; cDNA EST 
yk326f6.5 comes from this gene 
[Caenorhabditis elegans] 


7.1 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



1113 



1114 



1115 



1116 



Nearest Neighbor (BlastX vs. Nort- Redundant Proteins) 



ACCESSION 



AF012899 



AF086562 



L39062 



269364 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 



Homo sapiens full 
length insert cDNA 
clone ZE16C03 



Homo sapiens 
nterleukin 9 receptor 
L9R pseudogene, 

exons 1-9 



-luman DNA 
sequence from 
cosmid L96F8, 
Huntingtons Disease 
Region, chromosome 
pi 6.3 contains EST 
and cDNA. > :: 
emb|Z69365|HSL96F 
8A Human DNA 
sequence from 
cosmid L96F8, 
■iuntington's Disease 
Region, chromosome 
p 16.3 contains EST 
and cDNA. 



DESCRIPTION 



9e-07 



4e-07 



4e-07 



4e-07 



1330345 



1072210 



(lbti/y>) coded lor by L". 



P VALUE 



eiegans cDfNA ykj4bio; coded 
for by C. eiegans cDNA 
ykl3hl0.5; coded for by C 
eiegans cDNA yk46e8.5; coded 
for by C eiegans cDNA 
yk46d5.5; coded for by C. 
eiegans cDNA yk43c2.5; coded 
for by C. eiegans cDNA 
yk46e8.... 



(U40945) coded for by C. 
eiegans cDNA yk74b9.3; coded 
for by C. eiegans cDNA 
yk74b9.5; similar to repeat of 
calcium channel alpha subunits; 
similar to tetracycline resistance 
protein; similar to hypothetical 
protein in HSP30-PMPI region 
(SP... 




3879983 



similar to 
rans forming protein etc2; 
cDNA EST EMBL:D34 1 37 
comes from this gene; cDNA 
EST EMBL:D37172 comes 
rom this gene; cDNA EST 
EMBL:D76266 comes from this 
gene; cDNA EST 



gene; cDNA 



3493176 



2e-29 



3.9 



3.3 



\ 



(AF022SS9) latent TGF beta 
binding protein [Mus musculusj 



3.0 v 



■ «W'iW/.WMI/.ll/, 



WO 01/02568 PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) | Nearest Neiehbor (BlastX v S . Non-Redundant Proteins) 


SEQ 
ID 


[accession 


DESCRIPTION 


P VALUE | ACCESSION 


DESCRIPTION 


P VALUE 






Human mRNA tor 


i 






1117 


D79986 


KIAA0164gene, 
complete cds 


4e-07 


403803 1 


(AC005936) hypothetical 
protein [Arabidopsis thaliana] 


0.30 


1118 


D43950 


Human mRNA tor 
KIAA0098 gene, 
partial cds 


3e-07 


<NONE> 


<NONE> 


<NONE> 


1119 


AF037168 


Arabidopsis thaliana 
DnaJ homologue 
(AU6) mRNA, 
complete cds 


3e-07 3881075 


(AJ-Uj^to / ) predicted using 
Genefinder; similar to DnaJ 
domain ; Thioredoxin; cDNA 
EST yk433f3.5 comes from this 
gene; cDNA EST 
EMBL:D32359 comes from this 
gene;cDNA EST 
EMBL:D34721 comes from this 
gene; cDNA EST yk433f3.3 c... 


3e-09 


1120 


X69838 


H. sapiens mRNA for 
G9a 


3e-07 J 3873414 


(U00043) similar to D. 
melanogaster trithorax protein 


3e-29 




AB011124 


Homo sapiens mRNA 
for KIAA0532 
protein, complete cds 


2c-07 


2618749 


(U90880) hypothetical protein 
2; predicted using XGrail 


2.0 


1122 


K03012 


Human cellular fms 
proto-oncogene, 
partial cds. 


le-07 


<NONE> 


<NONE> 


<NONE> 


i i n 


AB016195 


Homo sapiens DNA, 
microsatellite and Alu 
repeat region 


le-07 


728837 


!!!! ALU SUBFAMILY SQ 
WARNING ENTRY 


0.095 


1124 


Y 16795 


iomo sapiens 
psihHaA pseudogene 


4e-08 


<NONE> 


<NONE> 


<NONE> 


1 1 Ij 


AB012624 


Homo sapiens FLU 
gene for ERGB 
transcription fuctor, 
intron 4 and partial 
cds 


4e-08 


728836 


!!!! ALU SUBFAMILY SP 
WARNING ENTRY 


3.6 


1126 


AJ131341 


Homo sapiens oggl 
gene, exons 1-7 


4e-08 


1 13668 


till AT T w a « « s—* * * « a x. r t 

WW ALU CLASS C WARNING 
ENTRY !!!! 


3e-05 


1127 


L81902 


Homo sapiens 
(subclone l_cl0 from 
PI H69) DNA 
sequence 


3e-0S 


4225950 


(AJ13270I) centaurin gammalB 


1.8 


1128 


Y 17968 


Gallus eallus mRNA 
for high mobility 
group 1 protein 


3e-0S 


3041855 


(AC004537) similar to tumor 
suppressor p331NGl; similar to 
AF044076 (PID:g282920S) 
"Homo sapiens] 


3e-3i 


1129 


Y 13901 - 


Homo sapiens FGFR- 
4 gene 


le-08 


<NONE> 


<NONE> 


<NONE> 



¥31- 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



11301 L22024 



11311 AF012899 



11321 X14034 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Rerinnriam Proteins) 



ACCESSION 



Mesocricetus auratus 
serum amyloid P 
component gene, 
complete cds. | le-08 

Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds | le-08 



-luman mRNA for 
phospholipase C > :: 
gb|M37238|HUMPL 
C Human 
phospholipase C 
mRNA. complete cds.| le-08 



11331 Z59381 



11341 L81839 



1 135 1 X14448 



11361 AL023774 



1137 



1138 



X64639 
U97058 



H.sapiens CpG DNA, 
clone 152610, 
forward read 

cpgl52b!0.ftl a. 
Homo sapiens 

(subclone 2_h3 from 

PI H43) DNA 

sequence 



le-08 



le-08 



Human GLA gene for 

alpha-D-galactosidase 

A (EC 3.2 1,22) 
Human DNA 

sequence from clone 

799Fl5on 

chromosome Xq25, 

complete sequence 

Homo sapiens] 



le-08 



-I.sapiens DNA 
repetitive 
subtelomeric-like 
sequence (522 bp) 

-luman HuD gene, 
5'UTR 



le-08 



le-08 
5e-09 



DESCRIPTION 



<NONE> 



<NONE> 



P VALUE 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



3334427 



<NONE> 



<NONE> 



<NONE> 



MJ1207 Methanococcus 
annaschii >gi| 159 1337 
(U67562) protease synthase and 
sporulation negative regulator 
ail, putative [Methanococcus 
annaschii] 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



9.1 



1354935 



77356 
33878S6 



(U58330) probable copper- 
transporting atpase 



hypothetical 70K protein 
eggplant mosaic virus 



(AF070530) unknown [Homo 
sapiens] 



1.2 



0.098 



9.5 



WO 01/02568 



PCT/US00/18374 



1 Nearest Neighbor (BlastN vs. Genbank) 



SEQ 
ID 



ACCESSION 



DESCRIPTION 



1139 



1140 



1141 



1142 



Z82181 



1144 



1145 



AJ006587 



Yl 1 108 



AE001223 



1143 1 Z47046 



AG000746 



Human DNA 

sequence from 
cosmid E86Dl0on 
chromosome 22. 
contains ESTs, 
exontrap, complete 
sequence 



P VALUE 



Mus musculus mRNA 
for translation 
initiation factor eIF2 
gamma X 



H.sapiens WNT8B 
gene 



Treponema pallidum 
section 39 of 87 of 
he complete genome 



-luman cosmid 
QLL2C9 from Xq28 



■iomo sapiens 
genomic DNA f 21q 
region, clone: 
171Bm40 



M74002 



1146 U95094 



Human arginine-rich 
nuclear protein 
mRNA, complete cds 



Xenopus laevis XL- 
INCENP (XL- 
INCENP) mRNA. 
complete cds 



5e-09 



5e-09 



4e-09 



4e-09 



4e-09 



4e-09 



4e-09 



2e -09 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



728831 



!!!! ALU SUBFAMILY J 
[WARNING ENTRY 



1872200 



(U22376) alternatively spliced 
product using exon 13A 



2854198 



(AF045646) contains similarity 
to collagens 



3334189 



CELL DIVISION PROTEIN 
FTSY HOMOLOG 



104045 



fibroblast growth factor receptor 
Al precursor - African clawed 
frog >gi|2 14894 (M55163) 
fibroblast growth factor receptor 
[Xenopus laevis] 



113666 



!!!! ALU CLASS A WARNING 
I ENTRY !!!! 



3875371 



arginine rich domain, possesses 
weak similarity with the RNA 
binding domains from RNA 
splicing factor U2AF 65 KD 
subunit; cDNA EST 
EMBL:D64658 comes from this 
gene;cDNA EST 
EMBL:D66829 comes f... 
>gi|3878699|gnl|PID|e 135 1700 
possesses weak similarity with 
the RNA binding domains from 
RNA splicing factor U2AF 65 
KD subunit; cDNA EST 
EMBL:D64658 comes from this 
gene; cDNA EST 
EMBL:D66S29 comes f... 



8.4 



0.64 



4.0 



1.5 



1.3 



0.33 



3e-06 



2494337 



ENDO- 1 ,4-BETA-XYL ANASE 
(PRECURSOR sp.] 



4.9 



WO 01/02568 



PCT/USOO/18374 



Nearest Neighbor (BlastN vs. Genbank) 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



SEQ 
ID 



ACCESSION 



DESCRIPTION 



P VALUE 



ACCESSION 



DESCRIPTION 



P VALUE 



UDP: 



1147 U20554 



Drosophila 
melanogaster UDP* 
glucose:glycoprotein 
glucosyl transferase 
mRNA. complete cds 



2e-09 



2499087 



ULUCOStULYCOPROTEIN 

GLUCOSYLTRANSFERASE 
PRECURSOR (DUGT) 
glucosyl transferase - fruit fly 
(Drosophila sp.) 
glucosyl transferase precursor 
[Drosophila melanogaster] 



4e-24 



1 14$ I Z56162 



H.sapiens CpG DNA, 
clone91c9, forward 
read cpg91c9.ftla . 



le-09 



' <NONE> 



<NONE> 



<NONE> 



1149 AF 100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



le-09 



1002424 



(U25739) YSPL-1 form 1 [Mus 
musculus] 



8.9 



11501 M85276 



Homo sapiens NKG5 
gene, complete cds. 



le-09 



2315436 



(AF016447) No definition line 
bund [Caenorhabditis elegans] 



8.3 



11511 M94065 



Human 
dihydroorotate 
dehydrogenase 
mRNA, 3' end. 



le-09 



3892656 



(AB014464) MGC-24v [Mus 
musculus] 



6.2 



1152 AJ131895 



Homo sapiens 
genomic CAG repeat 
element, clone 
60o2(250) 



5e-10 



1153 Z82181 



rluman DNA 

uence from 
cosmid E86D10on 
chromosome 22. 
contains ESTs, 
exontrap, complete 
sequence 



<NONE> 



5e-10 



728831 



<NONE> 



<NONE> 



!.'!! ALU SUBFAMILY J 
WARNING ENTRY 



7.9 



1154 AJ224442 



Homo sapiens mRNA 
for putative 
methyltransferase 



5e-l0 



113667 



!!!! ALU CLASS B WARNING 
ENTRY !!!! 



0.15 



1155 AJ010230 



Homo sapiens RET 
finger protein- like 1 
a mi sense transcript, 
partial 



5e-l0 



728S34 



1156 AF111116 



Homo sapiens 
silencer of death 
domains (SODD) 
mRNA. complete cds 



1157 Z97017 



Homo sapiens mRNA 
or hypothetical 
protein 



5e-l0 



4160014 



4e-10 



<NONE> 



!!!! ALU SUBFAMILY SB2 
WARNING ENTRY 



0.006 



(API 1 1 1 16) silencer of death 
domains [Homo sapiens] 



2e-0S 



<NONE> 



<NONE> 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



1158 



1159 



ACCESSION 



DESCRIPTION 



AF001298 



1160 



Y11395 



U41096 



1161 



1162 



1163 



1165 



1166 



1167 



AF012899 



Homo sapiens type II 



integral membrane 
protein 



H.sapiens mRNA for 
p40 



Human non-coding 
sequence upstream 
from DOC-2 gene on 
chromosome 5 



Z36111 



D89174 



1164 Z95437 



AFO 12899 



X56997 



AF086253 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 



Sxerevisiae 
chromosome II 
reading frame ORF 
YBR242w 



Schizosaccharomyces 

pombe mRNA. partial 

cds, clone: SY 1004 
Human DN A 

sequence from 

cosmid Al on 

chromosome 6 

contains ESTs. 

HERV like retroviral 

sequence ■ 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 



Human UbA52 gene 
coding for ubiquitin- 
52 amino acid fusion 
protein 



-lomo sapiens full 
ength insert cDNA 
clone ZD40G12 



4e-10 



2c- 10 



2e-10 



6e- 1 V 



6e-ll 



6e-l 1 



5c- 11 



5e-l I 



2e-ll 



^e-U 1 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



P VALUE I ACCESSION 



<NONE> 



DESCRIPTION 



P VALUE 



<NONE> 



1000340 



728837 



<NONE> 



2213560 



3879758 



<NONE> 



3886065 



<NONE> 



(U34384) CheW [Borrelia 
burgdorferi] 



<NONE> 



2.4 



!!!! ALU SUBFAMILY SQ 
WARNING ENTRY 



<NONE> 



21347S0 



(Z97052) hypothetical protein 
{£W22U) similarity to yeast 

protein TREMBL ID E246895); 

cDNA EST EMBL:T0001 8 

comes from this gene; cDNA 

EST EMBL:C13908 comes 

from this gene; cDNA EST 

EMBL:C1 1656 comes from this 

gene; cDNA EST yk234a5.3 

comes from this ge... 



<NONE> 



(AF106581) contains similarity 
o C4-type zinc fingers 



<NONE> 



apoptosis inhibitor IAP homolog 
- human 



0.28 



<NONE> 



3e-27 



4e-30 



<NONE> 



4.9 



<NONE> 



3.8 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlasiX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



1168 



1169 



AB018314 



Homo sapiens mRNA 
for KIAA077 1 
protein, partial cds 



2e-ll 



3024343 



274972 



S.cerevisiae 
chromosome XV 
reading frame ORF 
YOR064c 



P53-B1NDING PROTEIN 
53BP2 Bbp/53BP2 [Homo 
sapiens] 



2e-ll 



2e-ll 



3041855 



(AC004537) similar to tumor 
suppressor p33INGl; similar to 
AF044076 (PID:g2 829208) 
[Homo sapiens 1 



2e-40 



1170 



282181 



Human DNA 
sequence from 
cos mid E86D10on 
chromosome 22. 
contains ESTs, 
exontrap, complete 
sequ ence 



7e-12 



<NONE> 



<NONE> 



<NONE> 



1171 



X77738 



H.sapiens red cell 
anion exchanger 
(EPB3, AEl, Band 3) 
gene, 3' region 



7e-12 



2135416 



hypothetical protein - human 
>gi|288145 



0.012 



1172 



1173 



1174 



S61977 



medium-chain acyl- 
CoA dehydrogenase 
{exon 10, intron 10} 
[human, Genomic, 
1407 nt] 



6e-12 



113666 



! !! ALU CLASS A WARNING 
ENTRY!!!! 



0.100 



X66285 



M.musculus DNA for 
HCI locus 



6e-12 



854065 



S78744 



protein S=activated 
protein L cotactor 
[rats, liver, mRNA, 
3315'nt] 



(XS3413) U88 [Human 
herpesvirus 6] 



2e-06 



6e-l2 



2338292 



(AF009243) proline-rich Gla 
protein 2 [Homo sapiens] 



3e-10 



1175 



X58474 



Bovine OXT gene for 
oxytocin, 5' 
noncodine region 



2e-l2 



1296429 



(L77967) small proline-rich 
protein with, paired repeat 



4.1 



1176 



Z56314 



H.sapiens CpG DNA, 
clone lOh 10, reverse 
read cpglOhlO.rtla . 



2c- 12 



293522 1 



(AF030154) pVII [bovine 
adenovirus type 3] 



2.8 



1177 



Z56314 



H.sapiens CpG DNA, 
clone lOhlO, reverse 
read cpglOhlO.rtla . 



2e-12 



2708659 



(AF037440) putative 26 kDa 
protein [Edwardsiella ictaluri] 



2.8 



1173 



Z19543 



Vl.musculus h2- 
:alponin cDNA | 



2e-12 



2497945 



BETA SCRUIN >gi| 10 1 5535 
'Z47541) beta scruin [Limulus 
3olyphemus1 



\ 



2e-0-i 



¥37 



WO 01/02568 PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






erythropoietin 










1 179 


S45332 


receptor [human, 
placental, Genomic, 
8647 nt| 


7e-13 


728835 


!!!! ALU SUBFAMILY SC 
WARNING ENTRY 


0.074 


1I8Q 


AFO 12899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


2e-l3 


<NONE> 


<NONE> 


<NONE> 


1181 

i 1Q1 


AFO 12899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 


2e-13 


<NONE> 


<NONE> 


<NONE> 


1182 


Z59509 


H. sapiens CpG DNA, 
clone 15a 1, reverse 
read cpgl5al.rtla . 


2e-13 


3150251 


(AL023634) hypothetical 
protein 


0.66 


" 8J 


D10170 


Human CYP1 1B2 
gene for steroid 18- 
hydroxylase 


2e-13 


728837 


!!!! ALU SUBFAMILY SQ 
WARNING ENTRY 


3e-05 


1184 


U65416 


Human MHC class I 
molecule (MICB) 
gene, complete cds 


2e-13 


126295 


LINE- 1 REVERSE 
TRANSCRIPTASE 
HOMOLOG 


6e-il 


1185 


AJ00603 1 


Mus musculus 
IH ABP gene, 
promoter 


8e-14 


2132223 


hypothetical protein YPL 1 86c - 
yeast 


1.1 


1186| U34976 


Human gamrna- 
sarcoglycan mRNA, 
complete cds 


8e*14 




1054903 


;U34976) gamma-sarcoglycan 
Homo sapiens] >gi|4239660 
sapiens] 


0.034 


1187 


D30647 


Rat mRNA for very- 
long-chain Acyl-CoA 
dehydrogenase, 
complete cas 


oe-14 


3183512 


ACYL-COA 

DEHYDROGENASE, VERY- 
LONG-CHAIN SPECIFIC 
(VLCAD) >gi|2388724 
(AF0I7176) very- long-chain 
acyl-CoA dehydrogenase [Mus 
musculus] 


8e-2_> 


1188 


Z63247 


H.sapiens CpG DNA. 
clone 7g4, forward 
read cpg7g4.fla . 


6e-14 


86285 


histone Hl.Ol - chicken 


6.8 


1189 


U27196 


Gall us gallus zinc 
finger protein (Fzf-I) 
mRNA, complete cds. 


3e-l4 


2134436 


line finger protein - chicken 
[frasmenO 


4e-10 


11 

11901 M26219 


African green 
monkey origin of 
replication 


2e-14 


<NONE> 


<NONE> 


<NONE> 
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WO 01/02568 



PCT7US00/18374 



ACCESSION 



DESCRIPTION 



P VALUE 



1191 1 AFL00694 



11921 AFO 12899 



11931 AJ005866 



1194 U32709 



11951 AF073485 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



Mus musculus 



Pontin52 mRNA, 
complete cds 

Sambucus nigra 
ri bosom e inactivating 
protein precursor 
mRNA. complete cds 



Homo sapiens mRNA 
for putative Sqv-7 
like protein, p artial 

Haemophilus 
influenzae Rd section 
24 of 163 of the 
complete genome 



Homo sapiens MHC 
class I-related protein 
MR I precursor 
[MR1) gene, partial 
cds 



1196 AF052135 



11971 AF100694 



11981 AFO 12899 



1199 Z75104 



-lomo sapiens clone 
23625 mRNA 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA. complete cds 



S.cerevisiae 
chromosome XV 
reading frame ORF 
YOR196C 



2e-l4 



4235641 



2e-14 



3043728 



2e-14 



4008517 



2e-14 



3861(356 



8e-l5 



728831 



8e-15 



4098124 



3e-15 



<NONE> 



3e-15 



1 1367 1 



3e-15 



3878570 



P VALUE 



(API 19040) NL0D 
[Lycopersicon esculentum] 



0.65 



(AB01 1 174) IOAA0602 protein 
[Homo sapiens] 



(AJ005866) Sqv-7-like protein 
[Homo sapiens] 
(AJ235272) 

POLYRIBONUCLEOTIDE 
NUCLEOTID YLTR AN SFERA 
SE (pnp) [Rickettsia 
prowazekii] 



0.004 



!!!! ALU SUBFAMILY J 
WARNING ENTRY 



(U73522) AMSH [Homo 
sapiens] 



<NONE> 



<NONE> 



! ! ALU CLASS F WARNING 
ENTRY !!!! 



L7 



[Z46381) similar to lipoic acid 
synthase; cDNA EST yk283b6.3| 
comes from this gene; cDNA 
EST yk283b6.5 comes from this 
ene; cDNA EST yk472f5.3 
comes from this gene; cDNA 
EST yk472f5.5 comes from this 
ene; cDNA EST yk476e7.3... 



le-15 



WO 01/02568 PCT/US00/18374 





I Nearest Neighbor (BtastN vs. Genbank) 


|| Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


| ACCESSION 


DESCRIPTION 


P VALUE 








(U42H33!) coded torbvC " ' 




1200 


X70O52 

'- 


S.cerevisiae sofl 
gene 


3e-15 


1125754 


elegans cDNA cml6f6; coded 
for by C. elegans cDNA 
CEESU63F; similar toS. 
cerevisiae SOF1 protein 

elegans] 


3e-29 


1201 


AF012899 


Sambucus nigra 
ribosome inactivating 
protein precursor 
mRNA, complete cds 


2e-15 


' • <NONE> 


<NONE> 


<NONE> 


1202 


M92295 


Gorilla gorilla gamma 
1 and gamma-2 
globin genes, 
complete cds. 


le-15 


284078 


hypothetical protein 2 - human 
>gi|182220 


7.4 


1203 


L34587 


Homo sapiens RNA 
polymerase II 
elongation factor SIIL 
p 15 subunit mRNA, 
complete cds. > :: 
gb|AR022286|AR022 
286 Sequence 7 from 
patent US 5792634 


9e-16 


<NONE> 


<NONE> 


<NONE> 


1204 


D83649 


Xenopus laevis 
mRNA for xSox7 
protein, complete cds 


8e-16 


2447043 


(D83649) xSox7 protein 
[Aenopus iaevisj 


**e-uo 


1205 


AC005190 


Homo sapiens PAC 
clone DJ1152D16 
from Xq23; complete 
sequence [Homo 
sapiens] 


3e-16 


<NONE> 


<NONE> 


<NONE> 


1206 


J03626 


Human UMP 
synthase mRNA. 
complete cds. 


3e-16 


113667 


! ! ! ! ALU CLASS B WARN ING 
ENTRY!!!! 


0.65 


1207 


J00083 


Human Alu family 
nterspersed repeat; 
clone BLUR11. 


3e-l6 


728836 


!!!! ALU SUBFAMILY SP 
WARNING ENTRY 


4e-06 


1203 1 


U70674 


vlus musculus m- 
Numb(m-nb) mRNA. 
:omplete cds 


le-16 J 


<NONE> 


<NONE> 


<NONE> 



WO 01/02568 PCT/US00/18374 





Nearest Neighbor fBlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1209 


U66619 


Human SWI/SNF 
complex 60 KDa 
subunit (BAF60c) 
mRNA, complete cds 


le-16 


1549247 


(U66619) SWI/SNF complex 60 
KDa subunit [Homo sapiens] 


0.003 


1210 


U75467 


Drosophila 
melanogaster Rga and 
Atu genes, complete 
cds 


le-I6 


1658503 


(U75467) Atu [Drosophila 
melanogaster] 


5e-32 


1211 


M72709 


Human alternative 
splicing factor 
mRNA, complete cds. 


3e-17 


<NONE> 


<NONE> 


<NONE> 


1212 


U26556 


Human ferritin H 

(FTHL13) 

pseudoeene. 


3e-17 


<NONE> 


<NONE> 


<NONE> 


1213 


D32064 


Human gene for 2- 
oxoglutarate 
dehydrogenase, 
complete cds 


3e-17 


2088843 


(AF003386) F59E12.9 gene 
product [Caenorhabditis 
elesans] 


0.12 


1214 


M76364 


Human (Papua New 
Guinean) 

Mitochondrial DNA 
control region, 
sequence 131. 


3e-17 


114009 


A^AG ?ROTEW 
>gi|72927|pir||BVECAG apaG 
protein - Escherichia coli 
>gi|40918 (X047il)URF 
hypothetical protein 
[Escherichia coli] 


0.006 






Homo sapiens 
genomic sequence 
from subtelomeric 










1215 


AF017466 


region of 
chromosome 4q 


le-17 


3947985 


(U78948) MADS-box protein 2 
[Malus domestical 


4.1 


• 

1216 


AF004876 


Homo sapiens 
54TMp (54tm) 
mRNA. complete cds 


le-17 


4101574 


(AF004876) 54TMp [Homo 
sapiens! 


0.006 


1217 


AF 100694 


Mas musculus 
Pontin52 mRNA, 
complete cds 


9e-l8 


<NONE> 


<NONE> 


<NONE> 


1218 


AF086758 


Rattus norvegicus Na- 
K-2C1 cotransporter 


4e-18 


3892703 


(AL033545) putative give ine- 
rich protein [Arabidopsis 
thaliana] 


0.30 


1219 


AF020089 


Homo sapiens 
PEN1 IB mRNA, 
complete cds 


4e-18 


2642493 


(AF023910) DNA 
topoisomerase I [Physarum 
polvcephalum] 


0.0S3 


1220 


X82333 


H. sapiens IRLB gene 
(exonl-3) 


4e-18 


106837 


irlB protein - human (fragment) 
>ci|33969 


2e-ll 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) | Nearest NeiQhbnr rRLi.rX v< Mnn.RpHnn^mt PmiPin^ 


SEQ 
ID 


ACCESSION DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 




| Human mRNA for 










1221 


KIAA0385 gene, 
AB002383 complete cds 


4c- 18 


3228540 


(AF060181) zinc Finger protein 
[Homo sapiens] 


6e-25 


1222 


X98485 P.vivax PV14 gene 


le-18 


<NONE> 


<NONE> 


<NONE> 


1223 


H. sapiens flow-sortec 
chromosome 6 
Hindlll fragment, 
Z79057 SC6pA2lE8 


I 

le-18 


2981631 


(AB012223) ORF2 [Can is 
familiaris] 


0-001 


1224 


[Homo sapiens (clone 
JH4Bl)PM-scl 
lautoantigen mRNA, 
LQ1457 complete cds. 


le-18 


346287 


nucleolar 100K polymyositis- 
scleroderma protein - human 
>gi|35555 (X66113) PM/Scl 
lOOkD nucleolar protein [Homo 
sapiens] 


0.001 


1225 


|Dog nonerythroid 
Ibeta-specrrin mRNA, 
L02897 3 end. 


4e-l9 


3493358 


(ABO 17037) nonstructural 
protein precursor [Himetobi P 
virusl 


0.12 


1226 


Homo sapiens mRNA 
for APCL protein, 
AB012162 complete cds 


4e-19 


3894265 


(ABO 12 162) APCL protein 
^Homo sapiensl 


0.002 


1227 


Homo sapiens mRNA 
for KIAA0521 
AB01 1093 protein, partial cds 


4e-19 


3043566 


(AB01 1093) KIAA0521 protein 
[Homo sapiens] 


9e-09 


1228 


X.laevis AB21 
mRNA for RPD3 
X78454 homoloaue 


4e-19 


3023945 


HISTONE DEACETYLASE 
(HD) thaliana] 


5e-34 


1229 


Human endogenous 
retrovirus H Dl 
leader 

Iregion/integrase- 
derived ORFl, 
ORF2, and putative 
envelope protein 
U88895 mRNA, complete cds 


2e-19 


59977 


(214310) tripanite fusion 
transcript PLA2L [Human 
endogenous retrovirusl 


le-04 


1230 


inuman tyrosine 
kinase TXK (txk) 
U34377 Laene, exon 13. 


le-19 


728831 


!!!! ALU SUBFAMILY J 
WARNING ENTRY 


3e-05 


1231 


M.musculus rab3A 
X72966 gene 


le-19 


< 
1 

2408076 1 


;Z9 l J 167) putative peroxisomal 
srganisation and biogenesis 
protein [Schizosacchaxomyces 
3ombe] 


2e-09 


1232! 


Homo sapiens 
ImRNA, chromosome 
1 specific transcript 
AB007953 [KIAA04S4 


4e-20 


<NONE> 


<NONE> 


<NONE> 



WO 01/02568 
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SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



1233 



1234 



DESCRIPTION 



P VALUE 



1235 



1236 



Human gene forZn- 
alpha2-glycoprotein, 
D 14034 complete cds 

IH.sapiens HOK-2 
X82 126 [gene, exon 2 



Luciferase reporter 
vector pXP2 *SA, 
AF093684 [complete sequence 



Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 



ACCESSION 



J05272 



Human IMP 
dehydrogenase type 1 
mRNA complete cds. 



1237 



D86997 



1238 



Z79865 



Human (lambda) 
DNA for 

immunoglobulin light 
chain 



IH.sapiens 

chromosome 22 CpG 
I island DNA genomic 
Msel fragment, clone 
302f3, forward read 
302f3.f 



2e-20 
2e-20 



DESCRIPTION 



5e-21 



5e-21 



5e-2I 



2e-21 



(ABOOI535) similar to 



P VALUE 



3928756 



2137269 



Celegans hypothetical protein 
CETO IH8.1 ,CEC05C 1 2.3,CEF5l 
|4D1.5. similar to trp and trp-like 

proteins [Homo sapiens] 
iDNA-binding protein - mouse 
>gi|437444 



2773363 



124417 



3878261 



2739037 



(AF024614) ADAM 10 
[Caenorhabditis elegans] Zinc- 
binding mctalloprotease domain ;! 
cDNA EST CEMSA42F comes 
from this gene; cDNA EST 
yk218G.3 comes from this genej 
cDNA EST vk443d9.3 comes 
from this gene; cDNA EST 
yk443d9.5 comes from this 
sene; cDNA... 



le-07 



le-19 



(AF04I382) microtubule 

binding protein D-CLIP-190 | 5 5 
INOSLNE-5 1 - 

MONOPHOSPHATE 

DEHYDROGENASE 1 (IMP 

DEHYDROGENASE 1) 

(IMPDH-I) (IMPD 1) I . human I 2e-04 

(Z75712) Similarity to S. Pombe) 
BEM1/BUD5 suppressor; 
cDNA EST EMBL.Z14470 
comes from this gene; cDNA 
EST yk482d4.3 comes from this 
gene; cDNA EST yk482d4.5 
[comes from this gene 
[Caenorhabditis elegansl | 6e-46 



2.6 



\ 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



P VALUE ACCESSION 



1239 



— ■ 



Mus musculus 
Pontin52 mRNA, 
AF100694 complete cds 



1240 



1241 



Human primary Alu 
U67824 1 transcript 



Homo sapiens clone 
24686 mRNA 
AF070636 | segue nee 



1242 



Human gene for Zn- 
alpha2-glycoprotein, 
D 14034 [complete cds 



1243 



Human lactase 
phlorizin hydrolase 
M61835 f(LCT) sene, exon 2. 



1244 



1245 



Mus musculus 
Pontin52 mRNA, 
AF100694 complete cds 



Homo sapiens full 
length insert cDN A 
AF074985 YH73H06 



1246 



D14878 



1247 



D16917 



Human mRNA for 
protein DI23, 
complete cds 



Human HepG2 3" 
region cDNA, clone 
hmd3d07 



6e-22 



6e-22 



2e-22 



2e-22 



2e-22 



6e-23 



8e-24 



7e-24 



6e-24 



3924779 



728832 



98710 



4185939 



728831 



1350828 



3170548 



<NONE> 



1397345 



DESCRIPTION 



u/u~uuujujj iiiuwai lu uuiiiiin — 
D, lDNA E3T >UjQU3.j luu i k 
from this gene; cDNA EST 
yk249a6.5 comes from this 
gene; cDNA EST yk2 19a2.5 
comes from this gene; cDNA 
EST yk355e4.5 comes from this 
gene; cDNA EST yk224f4.5 
comes fr... 

>gi|392488 1 |gnl|PID|e 1 354569 
from this gene; cDNA EST 
yk249a6.5 comes from this 
gene; cDNA EST yk219a2.5 
comes from this gene; cDNA 
EST yk355e4.5 comes from this 
gene; cDNA EST yk224f4.5 
comes from... 



P VALUE 



1,1 " ALU SUBFAMILY SB 
(WARNING ENTRY 



fatty-acid synthase (EC 
2.3.1.85) - B re vi bacterium 
ammonianenes 



(Y17832) pol protein [Human 
[endogenous retrovirus K] 



If!!! ALU SUBFAMILY J 
I WARNING ENTRY 



IRABPHILIN-3A 

>gi|477lOO|pir||A48097 
rabphilin-3A - bovine 

>gi|285646|gnl|PID|d 1003285 



(AF056I16) unknown [Fugu 
rubripes] 



<NONE> 



(U61955) contains multiple 
region of strong similaritv to 
C2H2-type zinc fingers 
(PS:PS00028) [Caenorhabditis 
elesansl 



0.35 



5e-07 



2.5 



0.29 



0.006 



0.14 



0.24 



<NONE> 



2.4 



WO 01/02568 PCT7US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



12481 269654 



1249| AB007914 



sequence from 
cosmid L98A6. 
Huntington's Disease 
Region, chromosome 
4pl6.3. 

Homo sapiens mRNA 
[for KIAA0445 
irotein. complete cds 



1250 



Homo sapiens full 
length insert cDNA 
AF088072 clone ZD93D10 



1251 



[homo sapiens cAMh 
specific 

phosphodiesterase 4A 
variant pde46 
(PDE4A) gene, exons 
2 through 1 3 and 
alternative splice 
lexons 3a. 6a, 6b. and 
AF069489 9a 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



3e-24 



4240566 



2c-24 



3885949 



2e-24 



323091 



(AF 123462) neurexin IH [Homo 
sapiens 



(AF095568) amelogenin 

[Paleosuchus palpebrosus] 
limmunodominant microneme 
protein EtplOO - Eimeria tenella 
>gi|2707733 (AF032905) 
microneme protein precursor 
[Etmic-1 (Eimeria tenella] 



4.5 



0.34 



12521 Y12853 



Homo sapiens P2X7 
[gene, exon 4-8 



Human 2SS 



1 253 M27830 complete cds. 



1254 



Homo sapiens 
mRNA, chromosome 
1 specific transcript 
AB007953 IKIAA04S4 



1255 Z60212 



12561 AF100694 



H.sapiensCpG DNA, 
clone 195c 8, forward 
read cpgl95c8.ftl a . 
|Mus musculus 
Pontin52 mRNA, 
complete cds 



1257 



Mus musculus 
Pontin52 mRNA, 
AF 100694 jcomplete cds 



1258 



Homo sapiens P2X7 
gene, exon 1 and 
Y12851 joined CDS 



2e-24 



728836 



!!!! ALU SUBFAMILY SP 
I WARNING ENTRY 



9e-25 



728831 



!!!! ALU SUBFAMILY J 
[WARNING ENTRY 



8e-25 



<NONE> 



<NONE> 



8e-25 



<NONE> 



8e-25 



7e-25 



158154 



<NONE> 



7e-25 



<NONE> 



2e-25 



<NONE> 



<NONE> 



(M81959) POU domain protein 
(Drosophila melanogaster] 



le-05 



<NONE> 



<NONE> 



3.3 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



* If II f I t I 1 < J I l| • I J(M J (I KlUIJi J iHIHI Cull i 
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Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor [BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



Mus musculus Tera 



1259 



(Tera) mRNA, 
U64033 complete cds 



1260 



Rattus norvegicus 
Rabin3 mRNA. 
U19181 complete cds. 



9e-26 



9e-26 



1261 



Caenorhabditis 
(elegans SEL-10 (sel 
10) mRNA, complete 
AF020788 Icds 



1262 



1263 



Cricetulus griseus 
mRNA for 

IPhosphatidylglycerop 
hosphate synthase, 
AB016930 [complete cds 



1264 



1265 



Mus musculus 
Pontin52 mRNA, 
AF1 00694 [complete cds 



H.sapiensSOMl72 
X91195 mRNA 



9e-26 



<NONE> 



624225 



<NONE> 



(U19181)Rabin3 (Rattus 
norvegicus] 



W 

bbL-IUPKUlhlN Candida 

CDC4 gene (TR:E234056); 

cDNA EST EMBL:D27699 

comes from this gene; cDNA 

EST EMBL:D27698 comes 

from this gene; cDNA EST 

EMBL:D32793 comes from this 

gene; cDNA EST 

EMBL:D33271 comes from this 

gen... 



3915881 



8e-26 



3e-26 



<NONE> 



le-13 



4159682 



3878629 



le-26 



<NONE> 



(AB016930) 

Phosphatidyl glycerophosphate 
synthase [Cricetulus griseus] 



(Z93385) predicted using 
GeneFinder; Similarity to 
B.subtilis GTP-binding protein 



<NONE> 



7e-32 



0.045 



2e-10 



<NONE> 



1266 



Mus musculus 
Pontin52 mRNA, 
AF100694 [complete cds 



le-26 



1360637 



Human MG21 
L08237 mRNA. partial cds. 



le-26 



1267 



1268 



(X95995) ENBP1 [Vicia sativa] 



3.1 



950411 



(L08237) located at OATL1 
[Homo sapiens] 



9e-09 



Mus musculus 
Pontin52 mRNA, 
AF100694 complete cds 



Mus musculus 
Pontin52 mRNA, 
AF100694 [complete cds 



9e-27 



3S81080 



(AL032657) similar to EGF-like 
domain; cDNA EST yk299al2.3 
comes from this gene; cDNA 
EST EMBL.D35398 comes 
from this gene; cDNA EST 
yk33 1H6.5 comes from this 
gene; cDNA EST yk299al2.5 
comes from this gene; cDNA 
EST yk467gS... 



0.001 



8e-27 



1731324 



HYPOTHETICAL PROTEIN 
>ai| 166306 



4.0 



WO 01/02568 PCT/US00/18374 





Nearest Neighbor (BlasiN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1269 


X89211 


n. sapiens u[\a ror 
endogenous retroviral 
like element 


8e-27 


2065209 


(Y 127 13) Gag polyprotein [Mus 
musculus] 


0.005 


1270 


U73166 


Homo sapiens cosmid 
clone LUCAI5 from 
3p2 1 .3, complete 
sequence [riomo 
sapiens 1 


3e-27 


728831 


lilt At II P 1 In A 1 4TT V/ f 

!!!! ALU SUBFAMILY J 
WARNING ENTRY 


4e-04 


1271 


D78255 


Mouse mKJN A ror 
PAP-1. complete cds 


3e-27 


1850098 


(D78255) PAP-1 [Mus 
musculus] 


2e-10 


1272 


API 00694 


Mus musculus 
rontinjz mKlNA, 
complete cds 


le-27 


2133579 


sperrnatophonn Sp23 - yellow 
mealworm molitor] 


0.39 


1273 


ABO 15202 


Homo sapiens gene 
for hippocalcin. exon 
2, 3 and complete cds 


le-27 


3877698 


(Z83318) predicted using 
Genefinder; cDNA EST 
yk369e7.5 comes from this gene 
[Caenorhabditis elesans] 


0.37 


1274 


API 00694 


ivi us rnuscuius 
Pontin52 mRNA, 
complete cds 


le-27 


3328188 


(AF074902) laminin alpha chain 
[Caenorhabditis eleaans] 


0.19 


1275 


Z29336 


H.sapiens gene for 

Cu/Zn-superoxide 

dismutase 


le-27 


728831 


!!!! ALU SUBFAMILY J 
WARNING ENTRY 


6e-05 


1276 




Mus musculus 
Pontin52 mRNA, 
complete cas 






sperrnatophonn Sp23 - yellow 


9.2 


1277 


AF 100694 


VI us musculus 
Pontin52 mRNA, 
complete cds 


9e-28 


2133579 


sperrnatophonn Sp23 - yellow 
mealworm molitor] 


0.054 


1278 


AB001636 


-lomo sapiens mRNA 
for ATP-dependent 
RNA helicase #46 t 
complete cds 


4e-28 


3913425 


PUTATIVE PRE-MRNA 
SPT ICING FAPTOR ATP. 

DEPENDENT RNA 
HELICASE >si|2275203 
(AC002337) RNA helicase 
isoloe [Arabidopsis thaliana] 


3e-22 


1279 


AF 100694 


VIus musculus 
Pontin52 mRNA, 
complete cds 


3e-28 


4056454 


(ACJ(X):>990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
>ene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|ZlS788 
come from this eene. 
Arabidopsis thaliana] 


0.066 



* i » 1 -4 * 
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Nearest Neishbor (BlastN vs. Genbank) 



SEQ 

ID I ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



P VALUE 



1280 



Mus musculus 
Pontin52 mRNA, 
AF 100694 jcomplete cds 



L281 



1282 



1283 



1284 



1285 



Mus musculus 
Pontin52 mRNA, 
AF 100694 complete cds 



Mus musculus 
Pontin52 mRNA, 
AF 100694 complete cds 



Mus musculus 
Pontin52 mRNA, 
AF100694 [complete cds 



Mus musculus 
Pontin52 mRNA, 
AF 100694 jcompletecds 



Mus musculus 
Pontin52 mRNA, 



AF100694 complete cds 

I nil * t e* i tnAi t lii, 



1286 



1287 



1288 



Mus musculus 
Pontin52 mRNA, 
AF 100694 complete cds 



Mus musculus 
Pontin52 mRNA, 
AF 100694 complete cds 



Mus musculus 
Pontin52 mRNA, 
AF100694 (complete cds 



Mus musculus 
Pontin52 mRNA, 
1289 1 AF 100694 complete cds 



1290 



1291 



Mus musculus 
Pontin52 mRNA, 
AF 100694 complete cds 



r 



3e-28 



263029 



H.sapiens CpG DNA, 
clone 77 b3, forward 
[read cpg77b3.ftla . 



le-28 



le-28 



le-28 



le-28 



le-28 



le-28 



le-28 



le-28 



le-28 



le-28 



4056454 



_ 



le-28 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



140505 



140505 



2133579 



4056454 



(ACOO^yyU) Contains repeated 



2493240 



region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34l65 and gb|2I8788 
come from this gene. 
[Arabidopsis thaliana] 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



PROBABLE INTRON 
MATURASE liverwort 
'Marchantia polymorpha) 
chloroplast >gi|l 1663 



PROBABLE INTRON 
MATURASE liverwort 
'Marchantia polymorpha) 
chloroplast >gi|l 1663 



sperrnatophorin Sp23 - yellow 
mealworm mo I i tor] 



(ACUUoyvO) Contains repeated 
egion with similarity to 
b|U43627 extensin (atExtl) 
»ene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Zl8788 
come from this gene. 
Arabidopsis thaliana] 



HYPOTHETICAL 293 KD 
PROTEIN pseudotsugata 
nuclear, polvhedros is virus] 



4e-05 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



<NONE> 



3.0 



1.8 



0.50 



0.0S7 



0.014 



WO 01/02568 



PCT/US00/18374 




Nearest Neiehbor (BlastN vs. Genbanle) 



ACCESSION 



DESCRIPTION 



P VALUE 



Mus musculus 
Pontin52 mRNA, 
1292 1 AFI0Q694 [complete cds 



IMus musculus 
Pontin52 mRNA, 
1293| AF 1QQ694 (complete cds 



1294 1 AF 100694 



1295 AF100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 
IMus musculus 

Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA. 
12971 AF 100694 Icomplete cds 



Mus musculus 
Pontin52 mRNA, 
1298| AFIQQ694 Icomplete cds 



1299 AF 100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Nearest Neighbor (BlastX vs. Non-ReriunHnm Proteins) 



ACCESSION 



DESCRIPTION 



le-28 



118588 



le-28 



4056454 



le-28 



le-28 



4056454 



126363 



le-28 



4056454 



le-28 



4056454 



le-28 



3157926 



le-28 



4056454 



P VALUE 



>gi| IOO035|pir||S 18 139 dehydrin 
DHN3 - garden pea >gi|20709 
(X63063) pea dehydrin DHN3 
[Fisum sativum 1 



0.010 



(ACWbWO) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234l65 and gb|2 18788 
come from this gene. 

[.Arabidopsis thaliana] | 0.007 

0990) Contains repeate 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234165 and gb|218788 
come from this gene. 
Arabidopsis thaliana] | 0.002 



LAMININ ALPHA- 1 CHAIN 
PRECURSOR precursor - 

mman | 3 e .Q4 

Lontains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34l65 and gb|Z 18788 

rnmp fmm thK pf*Hfl j 

[Arabidopsis thaliana! le-04 



Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|234165 and gb|218788 
come from this gene. 
Arabidopsis thaliana] 



3e-05 



(AC002131) Strong similarity to 
extensin-like protein gb|Z34465 
rom 2ea mays. [Arabidopsis 
haliana] 

O990) Contains repeate 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|Z1878S 
come from this gene. 
[Arabidopsis thaliana] 



2e-05 



\ 



le-05 



WO 01/02568 PCT/US00/18374 



SEQ 
ID 



1300 



Nearest Neighbor (BlasiN vs. Genbank) 



ACCESSION 



DESCRIPTION 



AFI0O694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



1301 AF100694 



Mus musculus 
Pomin52 mRNA, 
complete cds 



1302 



1303 



AF 100694 



AF100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



1304 



1305 



AF 100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



AF1 00694 



1306 1 AF 100694 



Mus musculus 
Pontin52 mRNA, 
complete cds 



Mus musculus 
Pontin52 mRNA, 
complete cds 



P VALUE 



le-28 



Nearest Neighbor (BlastX vs. Non-Redundanr Proteins) 



ACCESSION 



DESCRIPTION 



320919 



le-28 



4056454 



le-28 



4056454 



le-28 



4056454 



le-28 



4056454 



le-28 



4056454 



le-28 



kinetoplast-associated protein 



P VALUE 



Trypanosoma cruzi >gi|I62142 
(M25364) kinetoplast-associated 
protein 



4056454 



(A^W^yyOj Coniains repeated 

region with similarity to 

gb|U43627 extensin (atExtl) 

gene from Arabidopsis thaliana. 

ESTs gb|Z34165 and gb|Z 18788 

come from this eene. 

[[Arabidopsis thaliana] 

kAUUUDyyO) Contains repeated ' 

|region with similarity to 

gb|U43627 extensin (atExtl) 

gene from Arabidopsis thaliana. 

ESTs gb|Z34165 and gb|Z 18788 

come from this aene. 

[Arabidopsis thaliana] 



le-07 



9e-08 



Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana 
ESTs gb|Z34165 and gb|Z1878S 
come from this gene 
[Arabidopsis thaliana] 



(ALUCbyyO) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana 
ESTs gb|Z34165 and gb|Z18788 
come from this gene. 
[Arabidopsis thaliana] 



(AC005990) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTsgb|234I65 and gb|Z 18788 
come from this gene. 
[Arabidopsis thaliana] 



(AUUODyyU) Contains repeated 
region with similarity to 
gb|U43627 extensin (atExtl) 
gene from Arabidopsis thaliana. 
ESTs gb|Z34165 and gb|ZlS78S 
come from ihis gene. 
[Arabidopsis thaliana] 



le-09 



9e-10 



4c- 10 



9e-ll 



6e-ll 



WO 01/02568 PCT/USOO/18374 



\ 

i 





Nearest Neighbor (BlasuN vs. Genbank) 


Rarest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ED 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Mus musculus 










1307 




Pomtn52 mRNA. 
complete cds 


4e-29 


V t fc. | ■ ■ 

<NONE> 


<NONE> 


<NONE> 


1308 




Homo sapiens cAMP- 
specific 

pnospnoai esterase ab 


4e-29 


<NONE> 


<NONE> 


<NONE> 


1309 


X93334 


H. sapiens 

mitochondria! UNA, 
complete genome 


4e-29 


116977 


CYTOCHROME C OXIDASE 
POLYPEPTIDE I chain I - 
human mitochondrion (SGC1) 
>gi| 13006 (V00662) cytochrome 
oxidase I [Homo sapiens] 
>gi|506829 (JO 14 15) 
cytochrome oxidase subunit 1 
[Homo sapiens] sapiens] 


3e-09 


1310 


AF020760 


Homo sapiens serine 
protease (Omi) 
mRNA, complete cds 


4e-29 


2738915 


(AF020760) serine protease 
[Homo sapiens] 


8e-l2 


1311 


U95097 


Xenopus laevis 

• i 

mitotic 

phosphoprotein 43 
mRNA, panial cds 


4e-29 


2072294 


(U95097) mitotic 
phosphoprotein 43 [Xenopus 
laevis] 


Ie-25 


1312 


L32I62 


Homo sapiens 
transcription factor 
mRNA, 5' end. 


2e-29 


2501706 


RENAL TRANSCRIPTION 
FACTOR KID-1 finger protein 
[Mus musculus] 


8e-15 


1313 


AF 1 00694 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-29 


4056454 


(ALOODyyU) Contains repeated 

ii;£uuii T-viiii iiiniiariiy iu 

gb|U43627 exiensin (atExtl) 
gene from Arabidopsis thaliana. 
co i s go|£j-f ioj ana gojz.io/oo 
come from this gene. 
[Arabidopsis thaliana] 


le-04 


1314 


AFI00694 < 


Mus musculus 
Pontin52 mRNA, 
complete cds 


le-29 


1169643 


FMRFAMIDc"-RELAT£D 
NEUROPEPTIDES 
PRECURSOR >gi|4 16208 
(U03I37) neuropeptide 
precursor FMRFamide-related 
peptide [Lvmnaea staanalis] 


le-05 


1315 


1 

U50839 < 


Homo sapiens gl 6 
protein (gI6) mRNA, 
:omp!ete cds 


le-29 


3212101 


(AF0695 17) RN A binding 
protein DEF-3 [Homo sapiens] 


6e-I0 



I 



WO 01/02568 PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 












p pi — i Tf 1 I 

intercellular aanesion molecule 




1316 


X697 1 1 


H. sapiens mRNA for 
ICAM-R 


5e-30 


299356 


3, ICAM-3=lymphocyte 
function- associated antigen 1 
counter-receptor homolog 
[human, tonsil, Peptide Partial, 
518 aa] 


1 - AO 


1317 


AFO 10227 


Homo sapiens 
receptor-associated 
coactivator 3 


5e-30 


2331250 


(AFO 12 108) Amplified in Breast 
Cancer [Homo sapiens] 


8e-09 


1318 


AF086395 


Homo sapiens run 
length insert cDN A 
clone ZD75C01 


2e-30 


3861241 


(AJ235273) CELL SURFACE 
ANTIGEN (sca5) 


4.2 


1319 


M27830 


Human 28S 
ribosomal RNA gene, 
complete cds. 


2e-30 


1730522 


PHOS PHOGL Y C ER ATE 
KINASE 2.7.2.3) - Pyrococcus 
woesei >gi| 1054832 (X73527) 
phosphoglycerate kinase 
[Pyrococcus woesei] 


3.8 


1320 


M79307 


Mouse GTP-binding 
protein (RabI7) 
mRNA sequence. 


2e-30 


1 464564 


RAS-RELATED PROTEIN 
RAB-17 Rabl7 - mouse 
(fragment) >gi|297l57 
(X70804) rabl7 [Mus musculus] 


9e-ll 


1321 


AL022168 


Human DNA 
sequence from clone 
U247E12on 
chromosome Xq22- 
23, complete 
sequence [Homo 
sapiens] 


le-30 


2072967 


(U93570) putative pl50 [Homo 
sapiens] 


3e-ll 


1322 


X85124 


M.musculus pacsin 
gene 


le-30 


2217964 


(Z50798) p52 [Galius gallus] 


le-34 


1323 


U37408 


Homo sapiens 
phosphoprotein CtBP 
mRNA. complete cds 


5e-31 


74518 


structural polyprotein - 
Venezuelan equine enLcpn«iiiiis 
virus (strain TRD) >gi|323710 
(J04332) poly-envelope protein 
[Venezuelan equine encephalitis 
virus] 


LI 


1324 


L04193 


Human lens 
membrane protein 
(mp!9) gene, exon 
11. 


2e-3l 


728831 


!!!! ALU SUBFAMILY J 
WARNING ENTRY 


7e-07 



1325 


Mil 167 


-luman 28S 
ribosomal RNA gene. 


6e-32 


<NONE> 


<NONE> 


<NONE> J 



WO 01/02568 



PCT/US00/18374 



ACCESSION 



DESCRIPTION 



P VALUE 



1326 M33336 



1327 J03060 



1328 U33053 



13291 J04617 



1 330 1 L40396 



1331 Z72813 



1332 AB007941 



13331 AF044574 



13341 D 14657 



13351 X699I0 



Human cAMP- 
dependcnt protein 
kinase type I-alpha 
subunit (PRKAR1A) 
mRNA, complete cds 



Human 

glucocerebrosidase 
pseudogene. complete 
cds 



Human lipid- 
activated protein 
kinase PRK1 mRNA. 
complete cds 



Human elongation 
factor EF- 1 - alpha 
gene, complete cds. > 
: dbj|E02629|E02629 
DNA of human 
polypeptide chain 
elongation factor- 1 
alpha 



ACCESSION 



DESCRIPTION 



P VALUE 



2e-32 



<NONE> 



<NONE> 



2e-32 



2144479 



glucosylceramidase (EC 
3.2.1.45) precursor - human 



7e-33 



2137689 



protein kinase (EC 2.7.1.37) 
mouse 



■lomo sapiens (clone 
s22i71) mRNA 
raiment 



<NONE> 



le-05 



lc-14 



6e-33 



<NONE> 



<NONE> 



S.cerevisiae 



reading frame ORF 
YGR02Sw 



6e-33 



124235 



INTERMEDIATE FILAMENT 
PROTEIN B protein B- 
common roundworm 



M3P1 PROTEIN IIOMOLOG 



6e-33 



iomo sapiens mRNA 
or KIAA0472 
protein, partial cds 



Rattus norvegicus 
putative peroxisomal 
2,4-dienoyl-CoA 
reductase (DCR- 
AKL) mRNA, 

complete cds 

Human mRNA for 
KlAAOlOi gene, 
complete cds 



H.sapiens p63 mRNA 
for transmembrane 
protein 



2e-33 



2e-34 



7e-35 



7e-35 



1709135 



Yeast MSP1 protein (TAT- 
Ibindine homolos 4) 



1 150834 



(U42471) Wiscott-Aldrich 
Syndrome protein homolog 
[[Mus musculus] 



(AF044574) putative 
[peroxisomal 2,4-dienoyl-CoA 
4105269 [reductase [Rattus norvegicus] 



<NONE> <NONE> 



2136323 



mthorax homolog HTX - human 
[(fragment) homolog=MLL 
(alternative splicing, clone I4p- 
18B} 



<NONE> 



1.00 



8e-50 



2.0 



6e-15 



<NONE> 



0.94 



WO 01/02568 



PCT/US00/18374 





Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) | 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Homo sapiens 










1 

1 J JO 


AF053455 


tetraspan TM4SF 
(TSPANo) gene, 
complete cds 


7e-35 


3152703 


(AF065389) tetraspan NET-4 
[Homo sapiens] 


le-25 


1337 


X58374 


D.melanogaster cm 
mRNA 


3e-35 


117478 


CROOKED NECK PROTEIN 


6e-41 


1338 


AF086492 


Homo sapiens full 
length insert cDNA 
clone ZD95D11 


9e-36 


2909809 


(AF031328) aminoglycoside 6'- 
N- acetyl transferase It 


1.9 


133$ 


Z96223 


H. sapiens telomeric 
DNA sequence, clone 
12PTEL120, read 
12PTELOO120.seq 


3e-36 


2408068 


(Z99165) hypothetical protein 


0.61 


1340 


Z37986 


H.sapiens mRNA for 
phenylalkylamine 
binding protein. 


ie-36 


1362793 


emopamil -binding protein - 
human >gi|780263 


5e-ll 






Human ribosomal 
protein S27 mRNA, 

comnlpfp ccta end 
similar to similar to 
metal lopanstimulin 1 












U57847 


> :: 

gb|AA316327|AA316 

jit col looUol ML.L 

cell line (matastasis to 
liver in mouse) II 
Homo sapiens cDNA 
5' end similar to 
similar to 

metallopanstimulin 1 


3e-37 


1171014 


4Uo KIdUoUMAL, rKU I rAr* 

S27 growth factor- inducible zinc 
finger protein MPS-1 - human 
>gi|431319(L!9739) 
metallopanstimulin [Homo 
sapiens] >gi| 1373421 (U:>7847) 
ribosomal protein S27 


1.4 


1342 1 


Y 15054 


Rattus norvegicus 
rriKiNA ror /u tvua 
tumor specific 
antigen, partial 


3e-37 


3123027 


70 KD WD-REPEAT TUMOR- 
SPECIFIC ANTIGEN 

>°)\Zj\jj7D /|gni|rliJje jj jyyz. 

(Y 15054) 70 kD tumor-specific 
antigen [Rattus norvegicus] 


2e-15 


1343 


AF084205 


Rattus norvegicus 
serine/threonine 
protein kinase TAOl 
mRNA, complete cds 


3e-37 


3452473 


(AF084205) serine/threonine 
protein kinase TAOl [Rattus 
norvesicus] 


5e-4~ 


1344 J 


X78604 


R. norvegicus 
(Sprague Dawley) 
ARL5 mRNA for 
ARF-like protein 5 


le-37 


<NONE> 


<NONE> 


<NONE> I 



WO 01/02568 PCT/USflO/18374 





Nearest Neiehbor (BlastN vs. Gcnbank) 


Nearest Neighbor (BlastX vs. Non-Redundam Proteins) 


SEQ 
tD 


ACCESSION 


DESCRIPTION 


P VALUE j 


ACCESSION 


DESCRIPTION 


P VALUE 
















1345 


AJ236644 


Homo sapiens 
chromosome 22 CpG 
island DNA T genomic 
Msel fragment, clone 
22CGIB49A3 . 
complete read 


le-37 


2239219 


(Z97210) hypothetical protein 


6e-05 


1346 


U09367 


Human zinc finger 
protein ZNF136 


4e-39 


2137269 


DNA-binding protein - mouse 
> g i|437444 


7e-23 


1347 


269649 


Human uina 
sequence from 
cosmid L69F7B, 
Huntington's Disease 
Region, chromosome 
4pl6.3 contains 
Huntington Disease 
CHD) gene. 


3e-39 


* 

3096918 


• 

(AL023094) putative cyclase 
associated protein CAP 
[Arabidopsis thaliana] 


5.6 


1348 


AF065389 


Homo sapiens 
tetraspan NET-4 
mRNA, complete cds 


le-39 


3152703 


(AF065389) tetraspan NET-4 
(Homo sapiens] 


6e-29 


1349 


AF038172 


Homo sapiens clone 
23923 mRNA 
sequence 


le-40 


1813464 


(U608S3) CapC [Bacillus 
firm us] 


2.8 


1350 


Z83095 


H. sapiens Fanconi 
anaemia group A 
gene, exons 39, 40, 
41, 42 and 43 


— Te^tO — 


2137870 


zinc finger protein - mouse 
(fragment) 




1351 


AF057734 


Homo sapiens 17- 
beta-hydroxysteroid 
dehydrogenase IV 
(HSD17B4) gene, 
exon 16 


l.e-40 


2842416 


(AL008730) dJ487J7.U 
(putative protein dJ487J7. 1 
isoform 1) [Homo sapiens] 


6e-61 


1352 


AF070567 


Homo sapiens clone 
24544 oeta- 
dystrobrevin mRNA, 
partial cds 


4e-41 


3133087 


(Y15718)dystrobrevin B DTN- 
B2 [Homo sapiens] 


7e-13 


1353 


AF006088 


Homo sapiens Arp2/3 
protein complex 
subunit pl6-Arc 
(ARC 16) mRNA. 
complete cds 


2e-41 


3121767 


ARP2/3 COMPLEX 16 KD 
SUBUNIT 


3e-36 


1354 


X69942 


M.musculus mRNA 
of enhancer-trap- 
locus 1 


6e-42 


2291152 


(AF0164IS) No definition line 
found [Caenorhabditis elegansl 


6.4 * 



VS5 



WO 01/02568 PCT/USOO/1 8374 



| Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


1 ACCESSION 


DESCRIPTION 


P VALUE 














1355 




H.sapiens mRNA for 
beta- eaten in 


5e-42 


1373019 


(U2881l)cysteine-rich 
fibroblast growth factor receptor 


8e-05 


1356 


AB018268 


Homo sapiens mRNA 

frtr ITT A A 070^ 

protein, partial cds 


5e-42 1 3882171 


{/\sivjioZOo) iviA/\u/zj protein 
[Homo sapiensl 


2e-33 






Human cathepsin E 
(CTSE) gene, exon 9 
ana complete cos. 


2e-42 


<NONE> 




<+.ly Uli E~> 


1358 


U80776 


Human EST clone 
NIB 1543 mariner 
uanspuiun nMii<ir i 
orf gene, complete 
cds 


2e-42 


2231380 


(U 80776) ort; encodes putative 
chimeric protein with SET 
domain in N-terminus with 
similarity to several other 
numan, urosopnua, ncrntuuuc 
and yeast proteins [Homo 
sapiens] 


3e-l 1 


1359 


U55184 


numan o proiein 
Golf alpha gene, exon 
12 and complete cds 


2e-42 


3165531 


(AF067608) No definition line 
found [Caenorhabditis elegans] 


le-16 


1360 


AC005190 


Homo sapiens PAC 

from Xq23, complete 
sequence [nomo 
sapiens] 


6e-43 


2978255 


(Aduu/4u/ ) myeloid zinc linger 
protein-2 [Mus musculus] 


2.3 


1361 


ABO 182 84 


Homo sapiens mRNA 
forKIAA0741 
protein, complete cds 


5e-43 


<NONE> 


<NONE> 


<NONE> 


1362 


AB011137 


Homo sapiens mRNA 
for KIAA0565 
protein, complete cds 


5e-43 


3043654 


(AB011137) KIAA0565 protein 
[Homo sapiens] 


le-07 


1363| M93651 


T-Tnmnn Qpr cp rv 

complete cds. 


2e-43 


<NONE> 


<NONE> 


<NONE> 


1364 


247087 


■I. sapiens mRNA For 
RNA polymerase II 
elongation factor-like 
protein. 


2e-43 


1872514 


'U84404) E6- associated protein 
B6-AP/ubiquitin-protein ligase 
'Homo sapiens] >gi|236103 1 
(AF01670S) E6-AP ubiquitin- 
protein ligase (Homo sapiens] 


7.2 


1365 


U27197 


>osophila 
melanogaster pelota 
(pelo) mRNA. 
complete cds 


2e-43 


1352736 


PEL OTA PROTEIN >gi|973224 
(U27197) pelota [Drosophila 
melanpsaster] 


le-46 



WO 01/02568 PCT/US00/18374 



t 
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Nearest Neiehbor (BlasiN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 


SEQ 
ID 




DESCRIPTION 




i 

AL-LcoolUlN 




n \/ \ f T ITT 

r VALUE 












RRP5 PROTEIN HOMOLOG 




1366 


U80U07 


Human mRNA for 
KIAA0185 gene, 
partial cds 


£. A A 

oe-44 


2498864 


(KIAA0185) hypothetical 
protein YM9959.ilCof 
S.cerevisrae. [Homo sapiensl 


6e-09 


1367 




Homo sapiens 
secretory carrier 
membrane protein 
(SCAMP3) mRNA, 
complete cds 


£ A A 

oe-44 


* 

2232243 


(AF005039) secretory carrier 
membrane protein [Homo 
sapiens] 


M f%Af\_ 

2e-09 


1368 


X68101 


R.norvegicus trg 
mRNA 


2e-44 


550420 


(X68101) trg gene product 
[Rattus norvegicus] 


le-37 


1369 


AF044206 


Homo sapiens 
cyclooxygenase 
(COX-2) gene, 
promoter and exon 1 


2e-45 


2072953 


(U93565) putative p!50 [Homo 
sapiens] 


5e-06 


1370 


L48708 


Homo sapiens 
faciogemtal dysplasia 
(FGDI)gene, 5' end 
of intron 17 


8e-46 


<NONE> 


<NONE> 


<NONE> 






Human COX VIIa-L 
mRNA for liver- 
specific cytochrome c 
oxidase {EC 1.9.3. 1.) 




- 


PHT VP PPT TP) P \/TT A I I VPO 

PRECURSOR 
>gi|2144370|pir||OSHU7L 
cytochrome-c oxidase (EC 
1.9.3.1) chain Vila precursor, 
hepatic - human >gi|30I47 
(X15822) precursor (AA -23 to 
60) [Homo sapiens] 


5e.l3 


1372 


U47323 


Mus muscutus 
stromal cell protein 
mRNA, complete cds 


3e-46 


1493833 


(U47323) stromal cell protein 
Mus musculus] 


le-4S 


1373 


AF059524 


-fomo sapiens 
reticulon gene family 
protein 


7e-47 


1731169 


HYPU1HE11CAL 113.1 Ri) 
PROTEIN T28D9.7IN 
CHROMOSOME IT >oil86P64 
(U28738) coded for by C. 
elegans cDN'A yk8h5.3; coded 
for by C. elegans cDNA 
yk8h5.5; similar to C. elegans 
deg-1 and mee-4 in exon 2 
Caenorhabditis eleaans] 


7.S 


1374 


AJ132583 


-lomo sapiens mRNA 
br puromycin 
sensitive 
aminopeptidase, 
partial 


3e-47 


■ 

1777519 


(U39123) T cell receptor beta 
chain [Homo sapiens] 


9.7 



I 1 II-** 
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Nearest Neighbor (BlasiN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 1 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 
















1375 


M97856 


T T ■ • 

Homo sapiens histone 
binding protein 
mRNA, complete cds. 


3e-47 


2645327 


(U83821) NADH 
dehydrogenase subunit 3 
[Oryzomys palustrisl 


5.7 


1376 


U53220 


Human 

retinoblastoma- 
related Rb2/pl30 
gene, 5" flanking 
region and partial cds 


3e-47 


2499225 


CMP-SIALIC ACED 
TRANSPORTER CMP- sialic 
acid transporter [Cricetulus 
griseus] 


5.3 


1377 


X87870 


H.sapiens mRNA for 
hepatocyte nuclear 
factor 4a 


Ie-47 


728832 


!!!! ALU SUBFAMILY SB 
WARNING ENTRY 


7.3 


1378 


AF060195 


Mus musculus 
proteasome regulator 
PA28 beta subunit 
gene, complete cds 


3e-48 


478681 


limb deformity protein - chicken 


0.25 


1379 


ABO 18285 


Homo sapiens mRNA 
for KIAA0742 
protein, partial cds 


le-48 


3122969 


item SPSOMC PROTEIN 
A (ZINC FINGER PROTEIN 
TSGA) >gi|281040|pir||S28499 
probable zinc finger protein - rat 
>gi|57504 (X59993) zinc finger 
protein 


le-30 


11 Rf! 

IJOV 


U35032 


Human endogenous 

^ rt^^V ■ # m. ^ fe m A 1 fern 

retrovirus clone 
c5.11, HERV-H 
multiply spliced 
subgenomic leader, 
protease and integrase 
region mRNA, partial 
cds 


4e-49 


88558 


retroviral proteinase-like protein 
- human 


6e-05 


1381 


AB007956 


nuiiiu dupicnb 
mRNA, chromosome 

specific transcript 
KIAA04S7 


le-49 


<NONE> 


<NONE> 


<NONE> 


1382 


D86987 


Homo sapiens mRNA 
forKIAA0214 
protein, complete cds 


le-49 


2497944 < 


ALPHA SCRUIN >gi|633238 
(Z38132) scruin [Limulus 
Polyphemus] 

>gi|1093326|prf||2103269A 
scrulin [Limulus sp.] 


9.7 


1383 


U25826 


-luman transcription 
factor (SCI) gene, 
:omplete cds. 


4e-50 j 


<NONE> 


<NONE> 


<NOXE> | 



I {A m i m 1 , . t *a uMt a ( tt/t v ( 1 at ftc <(d «l *V.t < i unY t «i v. 1/, tic Htf/i fcUiW 1 1 AtottoMtftflfHfrMvtH MUtrtMi Sh ftfWHft «W/Afrfft /( W/thmUO/aHaAtUiM At t/tlu >tOM ( Ui& * itAMilUA fc\ t it ml <V/.0» '.< i( A / r» (A t Ma iMlt'i thWttfx 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Gen bank) 



ACCESSION 



DESCRIPTION 



P VALUE 



ACCESSION 



1384 U46690 



13851 AF072128 



1386 AF093593 



Mus musculus ATP- 



1387 U79745 



1388 L09647 



dependent RNA 

helicase mRNA, 

artial cds. 

Mus musculus 

cIaudin-2 mRNA, 

c omplete cds 
omo sapiens 

snRNA activating 

protein complex 

l9kDa subunit 

(SNAP19) mRNA, 

complete cds 

Homo sapiens 
monocarboxylate 
transporter 
homologue MCT6 
mRNA, complete cds 



Ran us norvegicus 
hepatocyte nuclear 
factor 3a 



DESCRIPTION 



4e-50 



2e-50 



1335873 



3335184 



P VALUE 



(U46690) ATP-dependent RNA 
helicase [Mus musculus] 

(AF072128) cIaudin-2 [Mus 
musculus] 



le-50 



3668416 



le-50 



1177607 



le-50 



404764 



(AF093593) snRNA activating 
protein complex 19kDa subunit 
[Homo sapiensl 



(X92485) pval [Plasmodium 
vivax] 



(LI 0409) fork head related 
protein [Mus musculus] 



3e-24 



4e-24 



0.003 



2e-07 



2e-21 



1389 X61506 



Mouse E46 mRNA 
for E46 protein 



Human debrisoquine 
4-hydroxylase 



4e-51 



114909 



BRAIN PROTEIN E46 



LINE- 1 REVERSE 
TRANSCRIPTASE 
HOMOLOG protein 
5 eoueang] 



le-20 



5e-15 



1391 AF019767 



Homo sapiens zinc 
finger protein (ZPRl) 
mRNA, complete cds 



4e-52 



961507 



(D63788) anchor protein, LCM 



1392 Z37986 



H. sapiens mRNA for 
Dhenylalkylamine 
Jinding protein. 



2e-52 



5.9 



<NONE> 



<NONE> 



1393 U65416 



Human MHC class I 
molecule (MICB) 
gene, complete cds 



2e-52 



3878637 



£4yi28) weak similarity with 
SINR protein {Swiss Prot 
accession number P06533); 
cDNA EST EMBL:T0O631 
comes from this gene; cDNA 
EST yk293dl0.5 comes from 
this gene [Caenorhabditis 
elenans] 



<NONE> 



/ •< -.ill i i 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor (BlastN vs. Genbank) 



SEQ 

ID 1 ACCESSION 



1394 



DESCRIPTION 



Z57647 



1395 1 L13738 



1396 



1397 



1398 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



H.sapiensCpG DNA 
clone I89a6. forward 
read cpgl89a6.ftla 



Human activated 
p2lcdc42Hs kinase 
(ack) mRNA. 
complete cds. 



AF042379 



AF047441 



AF 104670 



Homo sapiens spindle 
pole body protein 
spc97 homoiog GCP2 
mRNA, complete cds 



Homo sapiens RNA 
polymerase 1 40kD 
subunit mRNA, 
complete cds 



Homo sapiens cell 
cycle protein 
(PA2G4) gene, exons 
6 through 13, and 
complete cds 



1399 



1400 



S60754 



DS6972 



VNTR locus DXZ4, 
hypervariable tandem 
repeat cluster J 
[human, Genomic, 
2991 nt] > :: 
gb|L07935|HUMVNT 
RA Homo sapiens 
microsatellite VNTR 
DNA sequence. 



Human mRNA for 
KI AA02 1 S sene, 
complete cds 



2e-52 



2e-52 



7e-53 



6e-53 



2e-53 



2e-53 



le-53 



111187 



2921447 



2801701 



3914807 



<NONE> 



DESCRIPTION 



1209669 



3426041 



beta-globin DNA-tunding 



P VALUE 



protein Bl, transcription factor 
PU.l - mouse >gi|200586 
(M32370) PU.l protein [Mus 
musculus) >gi|200972 
(M38252) transcription factor 
Pu.l [Mus musculus] 



5.8 



(AF037260) non-receptor 
protein tyrosine kinase Ack 
[Mus musculus] 



7e-23 



(AF042379) spindle pole body 
protein spc 97 homolos GCP2 I le- 1 6 

bNA-DlkLClED RtfX 

POLYMERASE 1 40 KD 
POLYPEPTIDE (RPA40) 
(RPA39) >gi|2266929 
(AF008442) RNA polymerase I 
subunit hRPA39 [Homo 
sapiens] | 4e-19 



<NONE> 



<NONE> 



(U38810) CAGR1 [Homo 
sapiens] >gi|3098420 
(AF040945) homeotic regulator 
homoiog MAB21 [Mus 
musculus] 



(AC005168) unknown protein 
[Arabidopsis thaliana] 



4.6 



9.1 



WO 01/02568 



PCT/US00/18374 



Nearest Neighbor ( BlastN vs. Genbank) 



SEQ 

ID I ACCESSION 



DESCRIPTION 



1401 1 AJ236682 



1402 | AJ236682 



1403| M37583 



Nearest Neighbor (BlastX vs. Non-Red, mrinni Proteins 



P VALUE I ACCESSION 



omo sapiens 
chromosome 22 CpG 
island DNA. genomic 
Msel fragment, clone 
22CGIB49E6 t 
complete read 



Homo sapiens 
chromosome 22 CpG 
sland DNA, genomic 
Msel fragment, clone 
22CGIB49E6 . 
complete read 



7e-54 



1404 1 AJ009947 



1405| Y08459 



1406 1 AF0423S4 



1407 1 AF042379 



-luman histone 
(H2A.Z) mRNA, 

complete cds. 

Homo sapiens mRNA 
for putative ATPase, 
partial 



6e-54 



B.taurus mRNA for 
novel cytoplasmic 
protein 



Homo sapiens BC-2 
protein mRNA, 
complete cds 



Homo sapiens spindle 
pole body protein 
spc97 homolog GCP2 
mRNA, complete cds 



6e-54 



6e-54 



2e-54 



2e-54 



14081 AF005355 



14091 AF008442 



14101 AF047441 



Oryctolagus 
cuniculus translation 
initiation factor 
elF2C mRNA, 
complete cds 



Homo sapiens RNA 
polymerase I subunit 
hRPA39 mRNA, 
complete cds 



Homo sapiens RNA 
polymerase I 40kD 
subunit mRNA, 
complete cds 



8e-55 



7e-55 



3e-55 



3e-55 



DESCRIPTION 



70711 



<NONE> 



2828147 



2801701 



3253159 



3335138 



P VALUE 



(AL034355) putative 
(cytochrome oxidase subunit I 
3928721 IfScreptomyces coelicolor] | 0.30 



(AL034355) putative 
cytochrome oxidase subunit I 
[[Streptomyces coelicolor] 



0.28 



histone H2A.F, embryonic - 
chicken 



2e-16 



. (AJ009947) putative ATPase 
3550295 ([Homo sapiens] | 3e- 1 8 



<NONE> 



<NONE> 



(AF042384) BC-2 protein 
[Homo sapiens! 



2e-14 



3335138 



(AF042379) spindle pole body 
rotein spc97 homoloe GCP2 



2e-17 



(AF005355) translation 
initiation factor eIF2C 



3e-53 



(AF047441) RNA polymerase 1 
40kD subunit [Homo sapiens] | 3e-20 



\ 



AF04744 1 } RNA polymerase I 
|40kD subunit [Homo sapiens! I 3e-20 



WO 01/02568 



PCT/US00/18374 



1 Nearest Neighbor (BlastN vs. Genbank) 


Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 


SEQ 
ID 


ACCESSION 


DESCRIPTION 


P VALUE 


ACCESSION 


DESCRIPTION 


P VALUE 






Human mRNA for 








1411 


X08004 



RaplB protein > :: 
emb|A08693|A08693 
H.sapiens rap lb 
cDNA 


0 * 

2e-55 1 539995 


transforming protein rap lb - rat 
(strain Copenhagen) 


2e-18 1 


1412 


AFO 10403 


Homo sapiens ALR 
mRNA. complete cds 


2e-55 


2358285 


(AFO 10403) ALR [Homo 
sapiens] 


le-49 


1413 


M77016 


Human tropomodulin 
mRNA, complete cds. 


8e-56 1 " 262249 


(S52010) orf 1 5' of EpoR [mice, 
Peptide, 85 aa] [Mus sp.] 


0.027 


1414 


AB020633 


Homo sapiens mRNA 
forKIAA0826 
protein, partial cds 


2e-56 


<NONE> 


<NONE> 


<NONE> 


1415 


X87489 


H.sapiens genomic 
DNA (chromosome 
3;cIoneNL1243D) 


2e-56 


1814029 


(U84501) cuticle collagen 
[Caenorhabditis briggsae] 


0.038 


1416 


AB007893 


Homo sapiens 
KIAA0433 mRNA, 
partial cds 


2e-56 


^887437 


(AB007893) KIAA0433 [Homo 
sapiens] 


9e-2l 


1417 


X78925 


H.sapiens HZF2 
mRNA for zinc finger 
protein 


le-56 


3342002 


(AF054180) hematopoietic cell 
derived zinc finger protein 
[Homo sapiens] 


2e-21 


1418 


Z5628 1 


-I.sapiens mRNA for 
nterferon regulatory 
: actor 3 


9e-57 


2497442 


INTERFERON 
REGULATORY FACTOR 3 
factor 3 [Homo sapiens] 


2e-21 


1419 


U78772 


-lomo sapiens nuclear 
VCP-like protein 
NVLp. 1 


8e-57 


2406565 


(U68140) nuclear VCP-like 
protein NVLp.2 [Homo sapiens] 


5e-20 


1420 


D79994 


-luman mRNA for 
KIAA0172 gene, 
partial cds 


3e-57 


- 

1 1 36404 


(D79994) similar to ankyrin of 
Chromatium vinosum. [Homo 
sapiens] 


9e-3S 


1421 


AB002342 


-luman mRNA for 
{IAA0344 gene, 
complete cds 


le-57 


2224629 


(AB002342) KIAA0344 [Homo 
sapiens] 


4e-20 


1422 


LI9437 


-luman transaldolase 
mRNA containing 
ransposable element, 
;omplete cds 


le-57 


1553119 


(U63 1 59) transaldolase [Mus 
musculus] 


2e-20 


1423 


D17532 


-luman mRNA for 
RCK. complete cds 


9e-58 


129376 


PROBABLE ATP- 
DEPENDENT RNA 
HELICASE P54 (ONCOGENE 
RCK) (DEAD BOX PROTEIN 

3) 


Ie-10 



WO 01/02568 PCT7US00/18374 





Nearest 


Neighbor (BlastN vs. Genbank) 


Nearest Neiehbor (BlastX vs. Non-Redundant Proteins) 


SEQ 














1 ID 


ACCESSIO 


1 DESCRIPTION 


P VALUE 


ACCESSION 

> 


DESCRIPTION 


P VAf T TF 






H .sapiens BDP1 


1424 


X79568 


mRNA for protein- 

tVrnQi" n^>nhncnhiric^ 
lYHjaii it ui iujUIIULUoc 

n. sapiens oUrl 


Q„ co 1 


1871531 


(X79568) protein-tyrosine- 
phosphatase 


le-22 


1425 1 


X79568 


iiiiMNrt iur proiein- 

_ tyrosine-phosphatase 
Homo sapiens 

mvjii.j rniviNA lor 

GDS- related protein. 


9e-58 I 


i871531 


(X79568) protein-tyrosine- 
phosphatase 


9e-23 


14261 


ABO 12295 




7- co 1 


204502 1 


(Z97184) RGL2 [Homo sapiens 


9e-19 


14271 


AF086040 


no mo sapiens run I 
length insert cDNA 
clone YX52E07 


le-58 I 


543222 


glutamine (Q)-rich factor 1, 
QRF-1 - mouse factor 1, QRF-1 
fmice B-cell ltnjkpmii RPT 1 
Peptide Partial, 84 aal 


3e-36 I 






Homo sapiens ca xi 














HlRMA for c r\ rhri n i /■> 1 














anhydrase-related 










1428 


ABQ18195 


ft 1 

cds 1 

Mqs museums LRE1 1 


4e-59 


<N0NE> 


<NONE> 


<NONE> 1 


14291 


AF071777 


(Irel) mRNA, | 
complete cds | 


4e-59 1 


3766209 


(AF071777)IRE1 [Mus 
musculus] 


7e-2S J 






iiomo sapiens nTJviN a j 














or SH3 bindina 

w 1 










1430 1 


AB000462 


Homo sapiens clone | 


la CO 1 


<NQNE> 


<NONE> 


<N0NE> 


1431 1 


AF038172 


23923 mRNA 
sequence | 


3e-59 | 


3758855 


(29855DMAL3P6.il 
Plasmodium falciparum] 


1.3 




















sec. ue nee from nha^e 1 














p» ILL I1UIIJ iX Cyilllg 1 














from the tip of the 




_ 










short arm of 1 














:hromosome 16, 










14321 


< 

284812 I 


spanning 2Mb of 
I6pl3. 3 Contains 
ESTs 


le 59 1 


* * * 

400927 


RIBONUCLEOPROTEIN 
R.B97D ribonucleoprotein 
Drosophila melanogaster] 


25 




I 


(uman laminin- 1 












t 


>inding protein gene, 1 












F 


Kirtial cds. and E2 












s 


mall nucleolar RNA 












(i 

c 


!ene, complete 










_1433 1 


U364S4 s 


equence | 


le-59 1 


226005 c 


>rotein 40kD [Mus musculus] 


7e-0? 1 



4 us 



WO 01/02568 
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Nearest Neighbor (BlastN vs. Genbank) 



SEQ 
ID 



1434 



1435 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (Bias tX vs. Non-Redundant Proteins) 



Homosapiens ERK 
activator kinase 
L 11 285 |(MEK2)mRNA. 



Homo sapiens full 
length insert cDNA 
AF086555 clone ZE14E04 



1436 M24766 



(Human (clone 
pHAIV2-12)aIpha-2 
[collagen type IV 



1437 



1438 



X65550 



Hsapiens mki67a 
mRNA (long type) 
for antigen of 
monoclonal antibody 
Ki-67 



Human calmodulin 
M27319 mRNA. complete cds. 



1439 



1440 



Homo sapiens mRNA 
for transducin (beta) 



Y12781 like 1 protein 



I Human mRNA for 
KIAA0385 gene. 
ABQ02383 complete cds 



1441 



1442 



1443 



Homo sapiens clone 
24732 unknown 
AFQ70614 ImRNA, partial cds 



AB002326 



Human mRNA for 
KIAA0328 gene, 
partial cds 



AF086471 



1444 AB002311 



Homo sapiens full 
length insert cDNA 
clone ZD88A01 



Human mRNA for 
KIAA0313 gene, 
complete cds 



le-59 



4e-60 



ACCESSION 



DESCRIPTION 



4e-60 



4e-60 



4e-60 



3e-60 



le-60 



2e-61 



6e-62 j 



5e-62 



2e-62 



2499630 



3287674 



29551 



1 170654 



1345451 



62133 



547891 



<NONE> 



IUUaL SPECIFICITY 



P VALUE 



I MITOGEN - ACTIVATED 
PROTEIN KINASE KINASE 2 
(MAP KINASE KINASE 2) 
(MAPKK 2) kinase type 2 
[Gallus gall us] 



(AC005239) F23I49.1 [Homo 
[sapiens] 



(X05610) alpha (2) chain 
[Homo sapiens] 



ANTIGEN KI-67 
>gi|539555|pir||A48666 cell 
proliferation antigen Ki-67, long) 
form - human Ki-67 [Homo 
[sapiens] 



(X05949) Calmodulin (AA 2 ■ 
59) (449 is 1st base in codon) 
[Drosophila melanogaster] 



(X06172) put. 134 kD protein 
(AA 1 - 1 187); put, replicase 



(AF070614) unknown [Homo 
3283879 (sapiens] 



MICROTUB LTLE- 
ASSOCIATED PROTEIN 4 
microtubule-associated protein- 
|U [Bos taurus] 



<NONE> 



3e-21 



2e-04 



6e-I5 



3e-15 



7e-20 



7.4 



1001548 (D64000) hypothetical protein | 4,4 



3e-17 



5.6 



2506357 





DIHYDROXYPHENYLPROPI 
ONATE 1.2-DIOXYGENASE 
>gi| 1657544 (U73857) similar 
to mcpl gene (catechol 2.3- 
dioxygenase) of A. eutrophus 3- 
(2.3- ~ 

di hydroxy phenyl propionate)!, 2- 
dioxygenase 2.3- 

dihydroxyphenylpropionate 1 ,2- 
dioxyeenase 



<NONE> 



3.4 



<4W 



WO 01/02568 



PCT/US00/18374 



ACCESSION 



1445 



1446 



1447 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



Xenopus laevis 
Inotchless (nle) 
AF069737 ImRNA, complete cds 



2e-62 



AF044209 



[Homo sapiens nuclearj 
receptor co-repressor 
N-CoR mRNA, 
[complete cds 



M69238 



Human aryl 
hydrocarbon receptor 
nuclear translocator 
j(ARNT) mRNA, 
complete cds. 



2e-63 



14481 XS0497 



1449| AF03U41 



H.sapiens PHKLA 
mRNA 



Homo sapiens 
ubiquitin conjugating 
enzyme 



2e-63 



14501 Z37166 



1451 1 M64240 



H.sapiens BAT1 
mRNA for nuclear 
RNA helicase 



^ix-ioop- 
helix zipper protein 
(max) mRNA, 
complete cds. > :: 
gb|I41138|I41138 
Sequence 1 from 
patent US 5624818 > 

gb|I77062|I77062 
Sequence 1 from 
patent US 5693487 



6e-64 



5e-64 



DESCRIPTION 



2137603 



2702319 



P VALUE 



(AF069737) notchless [Xenopus) 
laevis] 



nuclear receptor co-repressor N- 
CoR - mouse musculus] 
>gi|I583865|prfl|2121436A 
thyroid hormone receptor co- 
repressor [Mus musculus] 



le-55 



1170685 



(AF001307) aryl hydrocarbon 
receptor nuclear translocator; 
Arnt [Homo sapiens] |_5e-19 

KINASE ALPHA 
REGULATORY CHAIN, 
LIVER ISOFORM 
(PHOSPHOR YLASE KINASE 
ALPHA L SUB UNIT) 
>gi|663010(X80497) 
phosphorylase kinase 
phosphorylase kinase alpha 
subunit [Homo sapiens] | 5e-22 



(AF031141) ubiquitin 
conjugating enzyme [Homo 

sapiens] 



2500529 



88175 



PROBABLE ATP- 
DEPENDENT RNA 
HELICASE P47 
>gi|2l35840|pir||I3720i nuclear 
RNA helicase (DEAD family) 
BAT1 - human >gi|587146 
(Z37166) nuclear RNA helicase 
(DEAD family) [Homo sapiens] 



Myc-binding factor Max, short 
form - human 



9e-24 



8e-22 



WO 01/02568 



PCT/US00/18374 



SEC 
ID 


Neares 

2 

ACCESSIOI 


t Neighbor (BlastN vs. 
DESCRIPTION 


Genbank) 
P VALUE 


Nearest Neief 1 
ACCESSION 


ibor (BlastX v S . Non-Redundant F 
DESCRIPTION 


'roteins) 
P VALUE 


1452 


M98252 


Homo sapiens lysyl 
hydroxylase (partial 
clone 2.2 Kb LH) 
RNA, complete 
mature peptide. 


2e-64 


400205 


IPKOCOLLAUbN-LYSINELJ- 
OXOGLUTAKATE 5- 
DIOXYGENASE 
PRECURSOR (LYSYL 
HYDROXYLASE) lysyi 
hydroxylase [Homo sapiens] 


7e - 2 lJ 


1453 


U09550 


Human nvtHnrtil 

glycoprotein mRNA, 
complete cds. 


8e-65 


2493676 


OVIDUCT-SP^OrtC 
GLYCOPROTEIN 
PRECURSOR (OVIDUCTAL 
GLYCOPROTEIN) 
(OVIDUCTIN) 


1 

1 

2e-tl . 


1454 


X67877 


R.norvegicus mRNA 

resiniferatoxin- 
binding protein 


7e-65 


423664 


resimteratoxin- binding protein 
KBP-26, cytosolic - rat 
>gi|3 11660 (X67877) cytosolic 
resin iferatox in binding protein 
RBP-26 [Rattus norvegicus] 
>gi|1093373|prfl|2l03310A 
res in iferatox in -binding protein 
[Rattus norveaicus] 


2e-40 


1455 


AB018254 


nuiliu ajpiens nilNiNrV 

for KIAA07 1 i 
protein, complete cds 


6e-65 


92298 


glutamine/glutamic acid-rich 
protein 


0.9S J 


1456 


J03607 


Human 40-kDa 
keratin intermediate 
filament precursor 
gene. 


3e-65 


1070608 


keratin 19, type I, cytoskeletal - 
luman sapiens] 


4e-07 


1457 J 


U65896 


Human gamma- 
glutamyl carboxylase 
gene, complete cds 


2e-65 


<NONE> 


<NONE> 


<NONE> I 


1458 


I 

s 
d 
s 

U0768 1 n 


iuman NAD(H)- 
pecific isocitrate 
tehydrogenase alpha 
ubunit precursor 
iRNA. complete cds. 


2e-65 J 


] 

] 

j 

( 

I 
< 

c 

E- 
s 

1708399 s 


DEHYDROGENASE (NAD), 
VIITOCHONDRIAL SUB UNIT 
ALPHA PRECURSOR 
ISOCITRIC 

DEHYDROGENASE) (NAD+- 
>PECIFIC ICDH) 
lehydrogenase alpha chain 
precursor - human >gi|706S39 
ubunii precursor [Homo 
apiens] 




1459 


V 

P 
e 

U88080 * 


luman zinc finaer 
rotein (LD5-1) gene, 
xons 4, 5 and 6, and 
omplete cds 


2e-65 


(I 

1373394 n 


J57796) zinc finger protein 
Homo sapiens] >si|2306773 


2e-39 



WO 01/02568 



PO7US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



1460 1 M96625 



P VALUE 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



Galtus domesticus 
tensin mRNA 
sequence. 



3e-66 



Mus musculus myelin 
gene expression 
factor (MEF-2) 
1461 1 U 13 262 I mRNA. panial cds. 



Mus musculus Tera 
(Tera) mRNA, 
1462 1 U64033 [complete cds 



1465 



1466 



1467| U91582 



1468 1 X06762 



Macaca fascicularis 
UDP- 

a] 

se mRNA, complete 
cds 



1469 1 AB016930 



J470| X74504 



Mouse Hox2.3 
mRNA 



Cricetulus griseus 
mRNA for 
Phosphatidylglycerop 
hosphate synthase, 
complete cds 



M.musculus T10 
mRNA 



Ie-70 



M.musculus mRNA 

1463 1 X78939 [ for testin | 6e-74 

Mus musculus Tera 
(Tera) mRNA, 
1464 1 U64033 Icom plete cds I 2e-74 



[Canis familiaris UDP 
N-acetylglucosamine 
transporter mRNA, 
AFQ57365 complete cds | 9e-79 

iRattus norvegicus 
J mRNA for coronin- 
AJ0Q6064 like protein | le-82 



4e-89 



3e-92 



5e-94 



7e-97 



2134419 



536926 



tensin - chicken (fragment) 



P VALUE 



>gi|63805 (Z 18529) tensin 
(Gallus gailusj >gi|2 12755 
(L06662) tensin [Gallus pallusl 



(U 13262) myelin gene 
expression factor [Mus 
musculus] 



5e-72 I ' 1575505 



1351218 



1575505 



3298605 



3757680 



(U64033) Tera [Mus musculus] 



TESTIN 2 (TES2) 
[CONTAINS: TESTIN 1 



140396 



123255 



4159682 



1711658 



KAR4 yeast (Saccharomyces 
cerevisiae) 



ie-5I 



9e-42 



9e-34 



8e-31 



(U64033) Tera [Mus musculus] 5e-37 



(AF057365) UDP N- 
acetylglucosamine transporter 

Canis fam iliaris] I 9e-10 

(AJ006064) coron in-like protein 

Rattus norvegicus] | 3e-62 



le-OS 



HOMEOBOX PROTEIN HOX- 
B7 (HOX-2C ) | 9e-2j 



(AB016930) 

Phosphatidylglycerophosphate 
synthase [Cricetulus griseus] 



SER/THR-RICH PROTEIN 
T10INDGCR REGION 
>gi[480900|pir||S37488 gene 
TIP protein - mouse 



7e-34 



3e-59 



\ 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



P VALUE [ ACCESSION 



DESCRIPTION 



!47I 



U13175 



1472 



S79873 



Rattus norvegicus 
clone ubclOa 
ubiquitin conjugatin 
enzyme (E217kB) 
mRNA, complete cds 

L I M l " J- — 



l- Iamp-2= ly sosome- 
associated membrane 
protein-2 protein-2b 
(LAMP2) mRNA, 
alternatively spliced 

orm h-lamp-2b, 
complete cds. 



1473 D 1 3623 



1474 



1475 



AB013357 



ABO 16930 



1476 U38253 



Rat mRNA for p34 
protein, complete cds 



Cricetulus griseus 
mRNA for 
Phosphatidylglycerop 
hosphate synthase, 
complete cds 



Rattus norvegicus 
initiation factor elF- 
2B gamma subunit 
(eEF-2B gamma) 
mRNA, complete cds 



3e-98 



1351345 



e-1 19 



<NONE> 



e-1 12 



480379 



Mus musculus mRNAl 
for 49 kDa zinc fingei 
protein, complete cds | e- 136 



4153886 



e-1 17 



4159682 



e-1 03 



2494312 



| U13UjUl 1 UUA 1 UNIT 

E WZYMh £2-1 ; ULTJ 

(UBIQUITIN-PROTEIN 
LIGASE) (UBIQUITIN 
CARRIER PROTEIN) 
(E2(17)KB 3) 
>gi|1085588|pir||S53358 
Jubiquitin conjugating enzyme 
(E217lcB)-rat >gi|595666 
(U13175) ubiquitin conjugating 
enzyme [Rattus norvegicus] 
norvegicus] >gi| 1145691 
(U39318) UbcH5C [Homo 
sapiens] 



P VALUE 



5e-05 



<NONE> 



<NONE> 



ribosome- binding protein p34 
rat sp.] 



2e-05 



(AB013357) 49 kDa zinc finger 
protein 



5e-08 



(ABO 16930) 

Phosphatidylglycerophosphate 
synthase [Cricetulus griseus] 



4e-32 



TRANSLATION INITIATION 
FACTOR EIF-2B GAMMA 
SUBUNIT (EEF-2B GDP-GTP 
EXCHANGE FACTOR) 
subunit [Rattus norvegicus] 



3e-42 



WO 01/02568 PCT/US00/18374 



SEQ 
ED 



Nearest Neighbor (BlastN vs. GenbanlO 



ACCESSION 



DESCRIPTION 



1477| X73683 



14781 U32498 



14791 U41736 



I480l AF041338 



148 1 1 AF064553 



P VALUE 



.Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



I4S2I AB000517 



1483 D38517 



R.norvegicus mRNA 
for histone H3.3 | e-117 
Rattus norvegicus 
rsec8 mRNA, partial 
cds I e-108 

Mus musculus ancient! 
ubiquitous 46 kDa 
protein AUP1 

mRNA, complete cds | e- 146 



122075 



2143962 



1517822 



Bos taurus vacuolar 
proton pump subunit 
SFD alpha isoform 
(SFD) mRNA, 
complete cds 

Mus musculus NSD1 
protein mRNA, 
complete cds 



Rattus sp. mRNA for 
CDP-diacylglycerol 
synthase, complete 
cds 



Mouse mRNA for 
Dhml protein. 
complete cds 



e-1 19 



2895578 



e-121 



3329465 



e-146 



1517822 



e 1 1 8 



2137562 



DESCRIPTION 



HIU 1 KJi'L. I I J. J (UJ.UJ 

(\\2 i ni h* li n n tp " s 

(Drosophila melanogaster) 
histone H3.3B - chicken 
>gi|2119023|pir||S6I218 histone 
H3.3 - fruit fly (Drosophila 
hydei) 1-136) [Oryctolagus 
cuniculus] >gi|8046 (X53822) 
Histone H3.3Q gene product 
[Drosophila melanogaster] 
>gi|51198gallus] >gi|161190 
(Ml 7876) histone H3 [Spisula 
solidissima] >gi|2H853 
(Ml 1393) histone 3.3 [Gallus 
gallus] >gi|306848(M11354) 
H3.3 histone [Homo sapiens] 
melanogaster] >gi|96303l 
(X8 1205) histone H3.3 H3.3A 
variant [Drosophila 
melanogaster] musculus] 
rsec8 - rat (fragment) 
>gi| 1019441 (U32498) rsecS 
[Rattus norvegicus] 



P VALUE 



(U41736) ancient ubiquitous 46 
|[Mus musculus] 



(AF04133S) vacuolar proton 
pump subunit SFD alpha 
[isoform [Bos taurus] 



(AF064553) NSD1 protein 
|[Mus musculusl 



mouse Dhml protein - mouse 
musculus) 



le-45 



7e-48 



5e-49 



3e-49 



2e-50 



(U41736) ancient ubiquitous 46 
kDa protein AUP46 precursor 
[Mus musculus] | 2e-51 



6e-54 



WO 01/02568 



PCT/USOO/18374 



SEQ 



Nearest Neighbor fBlastN vs. Genbank) 



*P i ACCESSION 



DESCRIPTION 



1484 X54352 



M .domes tic us MD6 



mRNA 



1485 U57692 



i486 



Mus musculus N- 
terminal asparagine 
amidohydrolase 
(Ntanl) mRNA, 
complete cds 



X80I69 



M.musculus mRNA 
for 200 kD protein 



1487 U57692 



14881 U08215 



1489 B D85926 



1490 



1491 



Mus musculus N- 
errhinaJ asparagine 
amidohydrolase 
(Ntani) mRNA, 
complete cds 



Mus musculus Hsp70- 
related NST-1 (hsr.l) 
mRNA, complete cds 



Mouse mRNA for 
Ray, complete cds 



L20427 



X56044 



Rattus norvegicus 

dihydroxypolyprenylb 
enzoate 

methyltransferase 
mRNA, complete cds 



M.musculus mRNA 
for protein Htf9C 



e-139 



e-1 18 



e-1 19 



e-120 



Nearest Neighbor (BlastX vs. Non-R^„nH™r 



P VALUE ) ACCESSION 



DESCRIPTION 



1085499 



CDC4 repeat unit-containing 



P VALUE 



2498797 



2498797 



e-109 



e-1 10 



473407 



1944389 



e-I23 



[protein • mouse 



PROTEIN N- TERMINAL 
ASPARAGINE 
AMIDOHYDROLASE 
(PROTEIN NH2 -TERMINAL 
ASPARAGINE DEAMIDASE) 
(NTN-AMIDASE) (PNAD) 
(PROTEIN NH2 -TERMINAL 
ASPARAGINE 
AMIDOHYDROLASE) 
(PNAA) >gi| 1373365 (U5769I) 
N-terminal asparagine 
amidohydrolase [Mus musculus] 
amidohydrolase [Mus musculus] 



le-55 



FROTEIN TSG24 (MEIOTIC 
CHECK POINT 
REGULATOR) 
1717793 >gi|10835531pir||A551I7 tsg24 



PROTEIN N-TERMINAL 
ASPARAGINE 
AMIDOHYDROLASE 
(PROTEIN NH2 -TERMINAL 
ASPARAGINEDEAMIDASE) 
(NTN-AMIDASE) (PNAD) 
(PROTEIN NH2-TERMINAL 
ASPARAGINE 
AMIDOHYDROLASE) 
(PNAA) >gi| 1 373365 (U5769I) 
N-terminal asparagine 
amidohydrolase [Mus musculus] 
amidohydrolase [Mus musculus] 



5e-57 



9e-58 



457372 



e 121 



3183977 



(U08215) NST-1 (Mus 
musculus] 



(D35926) Rav [Mus musculus] 

(L20427) — ' 1 

dihydroxypoK prenylbenzoate 
methyltransferase 
dihydroxypoK prenylbenzoate 
methyltransferase [Rattus 
norvegicus] 



8e-58 



7e-58 



(X56044) protein Htf9C [Mus 
musculus] 



2e-58 



4e-59 



*P0 



le-60 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



DESCRIPTION 



P VALUE 



Nearest Neighbor (BlastX vs. Non- Redundant Proteins) 



ACCESSION 



p59fyn(T)=OKT3- 
I induced calcium 
1492 1 S74774 influx regulator 



IMus musculus BUB2- 
I like protein I 
(HBLP1) mRNA, 
1493| U88873 [complete cds 



Cricetulus griseus HT 
protein mRNA, 
1494 1 U48852 complete cds. 



Rattus norvegicus 
rexo70 mRNA, 
14951 AF032667 complete cds 



(Chinese hamster 
phosphatidylserine 
(decarboxylase 
1496| M62722 ImRNA. 3' end. 



DESCRIPTION 



P VALUE 



PROTO-ON'COGENE" 



e-163 



729896 



I TYROS 1NE-PROTEIN 
KINASE FYN (P59-FYN) 
>gi|4202l7|pir||A44991 protein 
tyrosine kinase (EC 2.7.1. 1 12) 
[fyn - mouse 



e-123 



4099611 



(U88873) BUB2-like protein 1 
|[Mus musculus] 



e-117 



1216486 



(U48852) HT protein 
[[Cricetulus griseusl 



e-142 



(AF032667) rexo70 [Rattus 
2827160 . Inorvegicus] 



jnorvegicusj 

PHOsPH A 1 ID V LSLRINE 



Mus musculus fatty 
jacid transport protein 
14971 AF072758 3 mRNA, partial cds 



1498 AB005549 



1499 U57344 



rattus norvesicus 
ImRNA for atypical 
|PKC specific binding 
>rotein, compl ete cds 
[Mus musculus 
homeobox protein 
Meis3 mRNA, 
[complete cds 



Mus musculus SKD3 
1500 1 U09874 ImRNA, complete cds. 



Mus musculus 
muskelin mRNA, 
15011 U72194 complete cds 



e-114 



118910 



DECARBOXYLASE 
PROENZYME 
>gi|I09423|pir||A38732 
phosphatidylserine 
decarboxylase (EC 4.1 . 1 .65) 
[Chinese hamster (frasment) 



e-130 



(AF072758) fatty acid transport 
protein 3; FATP3 [Mus 
3335567 [musculus] 



e-113 



(AB005549) atypical PKC 
specific binding protein [Rattus 
3868778 Inorvegicus] 



e-143 



3024124 



HOMEOBOX PROTEIN 
MEIS3 



e-142 



2493735 



SKD3 PROTEIN SKD3 [Mus 
'musculus] 



e-148 



3493462 



(U72I94) muskelin [Mus 
[musculus] 



M.musculus mRNA 
1502| XS0I69 I for 200 kD protein 



e-155 



PROTEIN TSG24 (MEIOTIC 
CHECKPOINT 
REGULATOR) 
1717793 l>gi|10S3553|pirllA55ll7ts°24 



8e-63 



Ie-63 



7e-64 



5e-66 



2e-67 



le-67 



2e-69 



6e-72 



le-72 



2e-74 



3e-7' 



Ml 1 



WO 01/02568 



PCT/US00/18374 



SEQ 
ID 



Nearest Neighbor (BlastN vs. Genbank) 



ACCESSION 



1503 



DESCRIPTION 



P VALUE 



U72I94 



Mus musculus 
muskelin mRNA, 
complete cds 



Nearest Neighbor (BlastX vs. Non-Redundant Proteins) 



ACCESSION 



DESCRIPTION 



e-154 



3493462 



(U72194) muskelin [Mus 
musculus] 



P VALUE 



2e-78 



1504 



Y 12836 



Cricetulus griseus 
mRNA for Zn finger 
factor 



e-146 



3150148 



(Y12836) Zn finger factor 
[Cricetulus griseus 1 



3e-83 



WO 01/02568 



PCT/US00/18374 



Table 5 



SEQ ID 



29 



30 



31 



186 



187 



196 



234 



308 



410 



431 



552 



639 



746 



807 



820 



822 



836 



1157 



1192 



1268 



1269 



1288 



Start 



Stop 



295 



31 



298 



175 



358 



37 



23 



110 



42 



71 



116 



157 



209 



339 



341 



108 



158 



41 



186 



238 



79 



73 



421 



182 



397 



395 



432 



322 



121 



172 



191 



428 



288 



561 



427 



392 



406 



262 



353 



444 



416 



315 



240 



234 



Score 



5872 



3943 



5625 



7660 



4320 



6049 



4607 



4150 



4036 



5538 



3930 



5797 



5379 



3620 



2930 



4179 



4430 



5279 



5469 



3540 



11640 



Direction 



For 



For 



For 



For 



For 



For 



For 



For 



For 



Rev 



Rev 



For 



For 



For 



Rev 



For 



For 



Rev 



For 



For 



For 



3953 



For 



Description 



mkk like kinases 

Basic region plus leucine zipper 
transcription factors 



mkk like kinases 



SH2 Domain 



Ank repeat 



mkk like kinases 



SH3 Domain 



Zinc finger, C2H2 type 



Basic region plus leucine zipper 
transcription factors 



ATPases Associated with Various 
Cellular Activities 



Basic region plus leucine zipper 
transcription factors 



ATPases Associated with Various 
Cellular Activities 



Fibronectin type III domain 




Zinc finger, C2H2 type 



EF-hand 



Basic region plus leucine zipper 
transcription factors 



Basic region plus leucine zipper 
transcription factors 



protein kinase 



Fibronectin type III domain 



Ank repeat 



LIM domain containing proteins 



Basic region plus leucine zipper 
transcription factors 



^3 



WO 01/02568 PCT/US00/18374 



SEQ ID 


Start 


Stop 


Score 


Direction 


Description 


1309 


248 


404 


8226 


for 


LIM domain containing proteins 


1324 


294 


356 

*J \J 


4690 


for 

1 W L 


.Zinc fin&er C!2H2 tvne 


1325 


1 


234 


8981 

\J S KJ 1 


for 


C!2 domain fnrot kinase* likp^ 


1336 

A —J W 


66 


164 

1 W T 


6390 


for 


WO domain Ci-beta reneat<* 


1360 

1 ._/ WW 


222 


377 


8686 

\j \j \j \j 


for 


I,TlVf domain rontainino nrotpinQ 


1365 


69 

W 


257 


5221 


for 

1 Wl 


Ra^ic rpcnon nln^ lfMirinp 7innpr 

transcription factors 


1380 


42 


140 


7130 


for 




1386 


243 


398 


8736 


for 


LIM domain containing proteins 


1410 


222 


350 


10553 


for 


Trypsin 


1417 


8 


354 


6073 


for 


Protein Tyrosine Phosphatase 


1454 


49 


209 


3996 


for 


Basic region plus leucine zipper 

tranQprintinn fnptr\rc 

lldJloL'l IJJliUll IdwLVJli 


1464 


4 


1 80 


4Q78 


for 


1VINA\ ICL-UgllUlUIl IllUlli. ^alva XNJtvlVl, 

RBD, or RNP domain) 


1478 


S4 


4^7 




frit* 
1VJI 


piUlClIl IVlIlubC 


1 T/U 


241 






1UI 


nciicaoCo Lunscrvcu v^" icriniiidi uumdin 


1496 


40 


612 


5187 


for 


protein kinase 


1503 


154 


216 


4870 


for 


Zinc finger, C2H2 type 


1514 


2 


252 


4662 


for 


RNA recognition motif, (aka RRM, 
RBD, or RNP domain) 


1527 


156 


212 


3520 


for 


Zinc finger, C2H2 type 


1538 


9 


635 


11087 


for 


wnt family of developmental signaling 
proteins 


1540 


289 


471 


4107 


for 


Basic region plus leucine zipper 
transcription factors 


1549 


200 


391 


4118 


for 


Basic region plus leucine zipper 
transcription factors 


1556 


163 


354 


3958 


for 


Basic region plus leucine zipper 
transcription factors 


1557 


207 


398 


4038 


for 


Basic region plus leucine zipper 
transcription factors 


1563 


107 


298 


3978 


for 


Basic region plus leucine zipper 
transcription factors 



MH4 



WO 01/02568 PCT/US00/18374 



i 



SEQID 


Start 


Stop 


Score 


Direction 


Description 


1622 


180 


365 


4022 


for 

t 


Basic region plus leucine zipper 
transcription factors 


1630 


100 


291 


3998 


tor 


Basic region plus leucine zipper 
transcription factors 


1674 


196 


258 


4880 


for 


Zinc finger, C2H2 type 


1676 


9 


86 


6610 


for 


Homeobox Domain 


1677 


316 


369 


5780 


rev 


Thioredoxins 


1688 


109 


410 


17414 


for 


Ras family 


1704 


184 


372 


3977 


for 


Basic region plus leucine zipper 
transcription factors 


1707 


92 


439 


24100 


rev 


Phosphatidylinositol-specific 
pnubpnuiipaSc i Qomain 


171 1 


261 


Jul 


6400 


ior 


w u domain, o-oeia repeats 


1744 




411 


10S79 

I U J / 


rev 


ocrine carDoxypepiiGases 


1755 


281 


367 


2580 


for 


EF-hand 


1762 


236 


334 


5880 


for 


WD domain, G-beta repeats 


1779 


64 


126 


4790 


for ; 


Zinc finger, C2H2 type 


1801 


295 


351 


4030 


for 


Zinc finger, C2H2 type 


1804 


301 


378 


3460 


for - 


Ank repeat 


1808 


36 


161 


4170 


for 


Basic region plus leucine zipper 
transcription factors 


1811 


184 


315 




for 


N-terminal homology in EtS domain 


1814 


127 


294 


10770 


for 


Bromodomain (conserved sequence 
found in human, Drosophila and yeast 

pruicuid. j 


1818 


o 


146 


4741 


Tnp 


j^uuuic-bii aJiucu rviN/\ oinaing muni 


1819 


278 


355 
—> ~j ~j 


1460 


for 


r\llK ICpCal 


1820 


123 


299 


12150 


for 


I IVJlllCVJl/UA L^Ulllalll 


1821 


127 


303 


12180 


for 


Homeobox Domain 


1830 


184 


267 


4270 


for ■ 


Ank repeat 


1832 


18 


173 


8987 


for 


SH3 Domain 


1835 


51 


206 


8987 


for 


SH3 Domain 


1839 


224 


307 


4270 


for 


Ank repeat 


1846 


12 


398 


36700 


for 


G-protein alpha subunit 



'TIS 



WO 01/02568 PCT/USOO/18374 



SEO ID 


Start 


StOD 


Score 


Direction 

J-/ 1 1 \*V< L 1 V/ 1 1 


Descrintion 


1909 


160 


258 


6370 


for 


WD domain, G-beta repeats 


1 Ol 1 




1^1 
1 J 1 


yjjj 


tor 


Zanc ringer, LoHL,4 type (KllNO tinger) 


1 OCA 

1 VoU 


£A 


1 Q7 


701 7 


tor 


Z,inc ringer, C3HC4 type (KINO tinger) 


ZIKO 


Z j j 


^A£ 


</t 1 A 

j41 0 


tor 


Zinc tinger, LLHL class 


0 1 1 ^ 
Z 1 J J 


z 


/1A1 
4U1 


1 A^O/C 

lOjyo 


tor 


A 1 rases Associated with various 
ueuuiar Activities 


ZZ 1 0 


on 


1 70 


JjOU 


frit- 
ior 


w Vv/rspj/ w wr oomain containing 
proteins 


ZZ1 o 


1 0*7 
J Z / 


ZZ J 


^<AA 
JJ\)\J 


for 


wu domain, Lr-oeta repeats 


2281 


20 


387 


6044 


for 


Protein Tyrosine Phosphatase 


ZZOZ 


1 Ol 

loi 


353 


5136 


fu- 
ror 


C2 domain (prot. kinase C like) 


2286 


12 


382 


5228 


for 


protein kinase 


T5 1 A 

23 JO 


20 


371 


5962 


tor 


Protein Tyrosine Phosphatase 


2363 


A O 

48 


21 1 


4132 


for 


T> * * 11'* 

Basic region plus leucine zipper 
transcription factors 


2424 


43 


194 


3996 


for 


Basic region plus leucine zipper 
transcription factors 


2428 


25 


350 


4675 


for 


Dual specificity phosphatase, catalytic 
domain 


2562 


1 O 

18 


1 A 1 
101 


4560 


tor 


Ank repeat 


2577 


0 


311 


10295 


for 


4 transmembrane segments integral 
membrane proteins 


2591 


60 


165 


4560 


for 


SH2 Domain 


1iC OA 

2684 


9 


A Z' 1 

461 


CTCA 

5759 


tor 


A HPT"* _ a , 1 ♦ . 1 \ 7 

ATPases Associated with Various 
Cellular Activities 


2o2o 


1 1 £i 

1 lo 


/I A A 

400 


1^1 AT 

1610/ 


•tor 


I—XT' A "P\ r\p a T T l_^1I^w^^^„ 

DbAD and UbAH box nelicases 


Zojy 


1 AA 
10U 


320 


C C CA 

5550 


rev 


ATPases Associated with Various 
cellular Activities 


Zo / i 


1 QS 

170 


jVZ 


o'JC/i 
yjoH 


fu- 
ror 


utAU ana UcAH dox nelicases 


2944 


18 


281 


10480 


for 


Calpain large subunit, domain III 


1GAG 


C 
J 


JO / 


5V /O 


rev 


protein kinase 


3015 


131 


214 


3600 


for 


Ank repeat 






ZyZ 


cine 
5295 


tor 


WD domain, G-beta repeats 


3081 


190 


252 


4360 


for 


Zinc finger, C2H2 type 


1 1 AC 
J Ivo 


2 / J 


367 


C7A 1 

5791 


tor 


WD domain, G-beta repeats 


1 1 zl7 


1 OA 


iao 
joy 


A A77 

4UZZ 


tor 


r» * * 

oasic region plus leucine zipper 
transcription factors 


3152 


129 


320 


3947 


for 


Basic region plus leucine zipper 
transcription factors 


3158 


167 


334 


4180 


for 


Basic region plus leucine zipper 
transcription factors 


3175 


14 


164 


5951 


for 


mkk like kinases 



4% 
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bbQ ID 


btart 


Stop 


Score 


Direction 


Description 


3175 


8 


112 


* ^ **** 

5968 


for 


protein kinase 


3178 


45 


386 


19398 


for 


ATPases Associated with Various 
Cellular Activities 


3183 


14 


215 


9133 


for 


4 transmembrane segments integral 
membrane proteins 


3190 


229 


390 


6089 


for 


mkk like kinases 


3190 


118 


390 


8063 


for 


protein kinase 


3193 


293 


355 


3570 


for 


Zinc finger, C2H2 type 


3195 


0 


215 


10146 


for 


4 transmembrane segments integral 
membrane proteins 


3197 


281 


343 


4490 


for 


Zinc finger, C2H2 type 


3208 


34 


256 


4190 


for 


Basic region plus leucine zipper 
transcription factors 


3258 


138 


394 


9877 


for 


Ras family 


3266 


8 


139 


9328 


for 


ATPases Associated with Various 
Cellular Activities 


3267 


97 


180 


3820 


for 


Ank repeat 


3274 


11 


187 


15442 


for 


Fork head domain, eukaryotic 
transcription factors 


3281 


15 


182 


9681 


for 


mkk like kinases 


3285 


16 


102 


4680 


for 


EF-hand 


3292 


208 


1 f\t\ 

300 


5585 


for 


WD domain, G-beta repeats 


3297 


7 


153 


6100 


for 


Helicases conserved C-terminal domain 


3306 


161 


223 


4900 


for 


Zinc finger, C2H2 type 


3307 


43 


321 


8740 


for 


SH2 Domain 


3339 
— 33*5— 


94 
— 65- 


342 
271 


14970 
12512 


for 

far 


SH2 Domain 

PDZ domain 


3351 


124 


270 


6068 


for 


Phorbol esters/diacylglycerol binding 
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Example 4 

Differential Expression of Polynucleotides of the Invention: 
Description of Libraries and Detection of Differential Expression 

5 The relative expression levels of the polynucleotides of the invention 

was assessed in several libraries prepared from various sources, including cell lines and 
patient tissue samples. Table 6 provides a summary of these libraries, including the 
shortened library name (used hereafter), the mRNA source used to prepare the cDNA 
library, the abbreviated name of the library that is used in the tables below (in quotes), 
10 and the approximate number of clones in the library. 



Table 6 

Description of cDNA Libraries 



Library 
(lib #) 


Description 


Number of 
Clones in 

this 
Clustering 


1 


Kml2L4 

Human Colon Cell Line, High Metastatic Potential 
(derived from Kml2C) 
"High Colon" 


307133 


2 


Kml2C 

Human Colon Cell Line, Low Metastatic Potential 
"Low Colon" 


284755 


3 


MDA-MB-23 1 

Human Breast Cancer Cell Line, High Metastatic Potential; 
micro-metastases in lung 
"High Breast" 


326937 


4 


MCF7 

Human Breast Cancer Cell, Non Metastatic 
"Low Breast" 


318979 


8 


MV-522 

Human Lung Cancer Cell Line, High Metastatic Potential 
"High Lung" 


223620 


9 


UCP-3 

Human Lung Cancer Cell Line, Low Metastatic Potential 
"Low Lung" 


312503 



if)* 
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Library 
(lib #) 


Description 


Number of 
Clones in 

this 
Clustering 


12 


Human microvascular endothelial cells (HMEC) - Untreated 
PCR (OligodT) cDNA library 


41938 


13 


Human microvascular endothelial cells (HMEC) - 
Basic fibroblast growth factor (bFGF) treated 
PCR (OligodT) cDNA library 


42100 


14 


Human microvascular endothelial cells (HMEC) - 
Vascular endothelial growth factor (VEGF) treated 
PCR (OligodT) cDNA library 


42825 


15 


Normal Colon - UC#2 Patient 
PCR (OligodT) cDNA library 
"Normal Colon Tumor Tissue" 


34285 


16 


Colon Tumor - UC#2 Patient 
PCR (OligodT) cDNA library 
"Normal Colon Tumor Tissue" 


35625 


17 


Liver Metastasis from Colon Tumor of UC#2 Patient 
PCR (OligodT) cDNA library 
"High Colon Metastasis Tissue" 


36984 


18 


Normal Colon - UC#3 Patient 
PCR (OligodT) cDNA library 
"Normal Colon Tumor Tissue" 


36216 


19 


Colon Tumor - UC#3 Patient 
PCR (OligodT) cDNA library 
"High Colon Tumor Tissue" 


41388 


20 


Liver Metastasis from Colon Tumor of UC#3 Patient 
PCR (OligodT) cDNA library 
"High Colon Metastasis Tissue" 


30956 


21 


GRRpz 

Human Prostate Cell Line 


164801 


22 

• 


WOca 

Human Prostate Cancer Cell Line 


162088 



The KM12L4 and KM12C cell lines are described in Example 1 above. 
The MDA-MB-231 cell line was originally isolated from pleural effusions (Cailleau)./ 
Natl Cancer. Inst. (1974) 55:661), is of high metastatic potential, and forms poorly 
5 differentiated adenocarcinoma grade II in nude mice consistent with breast carcinoma. v 
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The MCF7 cell line was derived from a pleural effusion of a breast adenocarcinoma and 
is non-metastatic. The MV-522 cell line is derived from a human lung carcinoma and is 
of high metastatic potential. The UCP-3 cell line is a low metastatic human lung 
carcinoma cell line; the MV-522 is a high metastatic variant of UCP-3. These cell lines 
5 are well-recognized in the art as models for the study of human breast and lung cancer 
(see, e.g., Chandrasekaran et al., Cancer Res. (1979) 5P:870 (MDA-MB-231 and MCF- 
7); Gastpar et aUJMed Chem (1998) 47:4965 (MDA-MB-231 and MCF-7); Ranson et 
al., Br J Cancer (1998) 77:1586 (MDA-MB-231 and MCF-7); Kuang et al., Nucleic 
Acids Res (1998) 26:1116 (MDA-MB-231 and MCF-7); Varki et al, Int J Cancer 

10 (1987) 40-A6 (UCP-3); Varki et al., Tumour Biol (1990) 77:327; (MV-522 and UCP-3); 
Varki et al., Anticancer Res. (1990) 70:637; (MV-522); Kelner et al., Anticancer Res 
(1995) 75:867 (MV-522); and Zhang et al, Anticancer Drugs (1997) 5:696 (MV522)). 
The samples of libraries 15-20 are derived from two different patients (UC#2, and 
UC#3). The bFGF-treated HMEC were prepared by incubation with bFGF at 1 Ong/ml 

15 for 2 hrs; the VEGF-treated HMEC were prepared by incubation with 20ng/ml VEGF 
for 2 hrs. Following incubation with the respective growth factor, the cells were 
washed and lysis buffer added for RNA preparation. The GRRpz cell line refers to low 
passage (3 passages or fewer) human prostate cells, and the WOca cell line refers to low 
passage (3 passages or fewer) human prostate cancer cells. 

20 Each of the libraries is composed of a collection of cDNA clones that in 

turn are representative of the mRNAs expressed in the indicated mRNA source. In 
order to facilitate the analysis of the millions of sequences in each library, the sequences 
were assigned to clusters. The concept of "cluster of clones" is derived from a 
sorting/grouping of cDNA clones based on their hybridization pattern to a panel of 

25 roughly 300 7bp oligonucleotide probes (see Drmanac et al., Genomics (1996) 
37(1):29). Random cDNA clones from a tissue library are hybridized at moderate 
stringency to 300 7bp oligonucleotides. Each oligonucleotide has some measure of 
specific hybridization to that specific clone. The combination of 300 of these measures 
of hybridization for 300 probes equals the "hybridization signature" for a specific clone. 

30 Clones with similar sequence will have similar hybridization signatures. By developing 
a sorting/grouping algorithm to analyze these signatures, groups of clones in a library 
can be identified and brought together computationally. These groups of clones are 
termed "clusters". Depending on the stringency of the selection in the algorithm 
(similar to the stringency of hybridization in a classic library cDNA screening protocol), 

35 the "purity" of each cluster can be controlled. For example, artifacts of clustering may 
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occur in computational clustering just as artifacts can occur in "wet-lab" screening of a 
cDNA library with 400 bp cDNA fragments, at even the highest stringency. The 
stringency used in the implementation of cluster herein provides groups of clones that 
are in general from the same cDNA or closely related cDNAs. Closely related clones 
5 can be a result of different length clones of the same cDNA, closely related clones from 
highly related gene families, or splice variants of the same cDNA. 

Differential expression for a selected cluster was assessed by first 
determining the number of cDNA clones corresponding to the selected cluster in the 
first library (Clones in 1 st ), and the determining the number of cDNA clones 

10 corresponding to the selected cluster in the second library (Clones in 2 nd ). Differential 
expression of the selected cluster in the first library relative to the second library is 
expressed as a "ratio" of percent expression between the two libraries. In general, the 
"ratio" is calculated by: 1) calculating the percent expression of the selected cluster in 
the first library by dividing the number of clones corresponding to a selected cluster in 

15 the first library by the total number of clones analyzed from the first library; 
2) calculating the percent expression of the selected cluster in the second library by 
dividing the number of clones corresponding to a selected cluster in a second library by 
the total number of clones analyzed from the second library; 3) dividing the calculated 
percent expression from the first library by the calculated percent expression from the 

20 second library. If the "number of clones" corresponding to a selected cluster in a library 
is zero, the value is set at 1 to aid in calculation. The formula used in calculating the 
ratio takes into account the "depth" of each of the libraries being compared, /.e, the 




In general, a polynucleotide is said to be significantly differentially 



25 expressed between two samples when the ratio value is greater than at least about 2, 
preferably greater than at least about 3, more preferably greater than at least about 5 , 
where the ratio value is calculated using the method described above. The significance 
of differential expression is determined using a z score test (Zar, Biostatistical Analysis. 
Prentice Hall, Inc., USA, "Differences between Proportions/' pp 296-298 (1974)). 
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EXAMPLE 5 

Polynucleotides Differentially Expressed in High Metastatic Potential 
Breast Cancer Cells Versus Low Metastatic Breast Cancer Cells 

5 A number of polynucleotide sequences have been identified that are 

differentially expressed between cells derived from high metastatic potential breast 
cancer tissue and low metastatic breast cancer cells. Expression of these sequences in 
breast cancer can be valuable in determining diagnostic, prognostic and/or treatment 
information. For example, sequences that are highly expressed in the high metastatic 

10 potential cells can be indicative of increased expression of genes or regulatory 
sequences involved in the metastatic process. A patient sample displaying an increased 
level of one or more of these polynucleotides may thus warrant more aggressive 
treatment. In another example, sequences that display higher expression in the low 
metastatic potential cells can be associated with genes or regulatory sequences that 

15 inhibit metastasis, and thus the expression of these polynucleotides in a sample may 
warrant a more positive prognosis than the gross pathology would suggest. 

The differential expression of these polynucleotides can be used as a 
diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the 
like. These polynucleotide sequences can also be used in combination with other 

20 known molecular and/or biochemical markers. 

The following tables summarize polynucleotides that are differentially 
expressed between high metastatic potential breast cancer cells and low metastatic 
potential breast cancer cells. 

Table 7 

25 Differentially expressed polynucleotides: Higher expression in 

high metastatic potential breast cancer (lib3) relative to low metastatic 

breast cancer cells (Hb4) 



SEQ ID NOs: 


Lib3 clones 


Lib4 clones 


Iib3/lib4 


472 


64 


0 


62 


1851 


6 


0 


6 


1856 


8 


0 


8 


1867 


6 


0 


6 


1872 


6 


0 


6 


1875 


12 


3 


4 


1923 


89 


22 


4 
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SEQ ID NOs: 


Lib3 clones 


Lib4 clones 


Iib3/lib4 


2118 


7 


0 


7 


2119 


7 


0 


7 


2135 


37 


13 


3 


2190 


19 


0 


19 


2193 


16 


5 


3 


2232 


12 


2 


6 


2239 


6 


0 


6 


2338 


21 


2 


10 


2378 


16 


4 


4 


2394 


6 


0 


6 


2395 


6 


0 


6 


2490 


13 


3 


4 


2505 


16 


2 


8 


2540 


8 


1 


8 


2542 


11 


1 


11 


2607 


11 


2 


5 


2640 


22 


5 


4 


2674 


8 


0 


8 


2679 


19 


0 


19 


2684 


14 


4 


3 


2707 


8 


0 


8 


2724 


9 


0 


9 


2757 


6 


0 


6 


2776 


10 


0 


10 


2804 


13 


2 


6 






0 


6 


2906 


14 


0 


14 


2959 


26 


8 


3 


2964 


17 


4 


4 


2968 


6 


0 


6 


2977 


22 


3 


7 


2980 


13 


1 


13 


3010 


6 


0 


6 


3043 


10 


1 


10 


3071 


33 


12 


3 


3072 


9 


1 


9 


3095 


19 


3 


6 


3097 


11 


2 


5 


3173 


12 


2 


6 


3203 


8 


1 


8 


3210 


27 


8 


3 
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SEQIDNOs: 


Lib3 clones 


Lib4 clones 


Iib3/lib4 


3212 


13 


1 


13 


3284 


8 


0 


8 


3288 


6 


0 


6 


3331 


14 


3 


5 


3335 


13 


1 


13 



Table 8 

Differentially expressed polynucleotides: Higher expression in 
low metastatic breast cancer cells (Iib4) relative to high metastatic 

potential breast cancer (lib3) 



bhK} lU iNUs: 


lid J clones 


lid 4 Clones 


HD4/HD3 


4Uz 


u 


o 


z: 
O 


014 


i 


Z 1 


7 


OZ4 


a 
u 


0 


o 


ozo 


a 
U 


o 
o 


o 

o 


/ iz 


A 

U 


0 

y 


Q 

y 


1A A 


A 

u 


7 


7 


1 1?S 


9 
z 


Z7 


1 ^ 


1452 


2 


13 


7 


1880 


0 


9 


9 


1915 


0 


7 


7 


1951 


0 


6 


6 


1955 


8 


32 


4 


2015 


0 


7 


7 


2046 


0 


7 


7 


2076 


1 


22 


23 


2087 


0 


6 


6 


2124 


0 


9 


9 


2145 


0 


8 


8 


2162 


0 


6 


6 


2163 


0 


12 


12 


2164 


5 


19 


4 


2172 


2 


15 


8 


2192 


5 


16 


3 


2244 


20 


43 


2 


2266 


3 


18 


6 


2313 


24 


56 


2 


2346 


1 


13 


13 



4^ 
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LID t l-JUIlca 


I1D4/I1DJ 


Z. J J J 


n 
u 


1 n 


1 n 


z. j / 1 




D 


0 


Z. J 7J 


1 

1 


1 7 
1 / 


1 7 
1 / 


9404 


1 

1 


9 1 
Z 1 


ZZ 




fl 


0 


0 


94Afi 
ZtOU 


n 


1 i 
1 1 


i i 
1 1 


9^91 


n 
u 


0 


o 




i 
i 


i n 
1 u 


1 u 


9^78 


u 


0 


O 


9^ft4 
Zjo^ 


i 


1 7 


1 7 
1 / 


7con 

ZJ7U 


u 


0 


o 


7AflQ 


1 
1 


o 

y 


Q 

y 


ZOjZ 


J 


OA 

Z4 


c 

J 


771 /I 
Z / 14 


c 

J 


OA 

z4 


c 


777Q 
z/zo 


A 
U 


0 


o 


Z OZ 


1 
1 


1 A 

14 


1 /I 

14 


7 7Q/1 

Z ly*\ 


A 

4 


1 c 
1 J 


4 


707/C 

ZoZO 


A 
U 


7 


7 


OQOT 

zyo/ 


C 
J 


1 c 

I J 


3 


3005 


1 
1 


14 


14 


1(\f\Q 


ZU 


CO 

JO 


3 


jU4/ 


A 

4 


1 7 
1 / 


A 

4 


JUj / 


Z 


1 7 
1 / 


n 

y 


10.7^ 


Z 


1 i 
1 1 


/: 
O 


lfi7£ 
JU /O 


A 

u 


o 


c 
O 


J 1 uz 


A 
U 


o 


D 


TrTS 

j i zo 







JZ 


4- 


1 1 19 


1 s 

1 J 


^9 

JZ 


/I 
4 


3142 


0 


6 


6 


3187 


22 


49 


2 


3253 


23 


96 


4 


3282 


19 


46 


2 


3285 


20 


40 


2 


3346 


0 


9 


9 



\ 
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EXAMPLE 6 

Polynucleotides Differentially Expressed in High Metastatic Potential Lung 
Cancer Cells Versus Low Metastatic Lung Cancer Cells 



differentially expressed between cells derived from high metastatic potential lung 
cancer cells and low metastatic lung cancer cells. Expression of these sequences in lung 
cancer tissue can be valuable in determining diagnostic, prognostic and/or treatment 
information. For example, sequences that are highly expressed in the high metastatic 

10 potential cells can be indicative of increased expression of genes or regulatory 
sequences involved in the metastatic process. A patient sample displaying an increased 
level of one or more of these polynucleotides may thus warrant more aggressive 
treatment. In another example, sequences that display higher expression in the low 
metastatic potential cells can be associated with genes or regulatory sequences that 

15 inhibit metastasis, and thus the expression of these polynucleotides in a sample may 
warrant a more positive prognosis than the gross pathology would suggest. 



diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the 
like. These polynucleotide sequences can also be used in combination with other 
20 known molecular and/or biochemical markers. 



expressed between high metastatic potential lung cancer cells and low metastatic 
potential lung cancer cells: 



5 



A number of polynucleotide sequences have been identified that are 



The differential expression of these polynucleotides can be used as a 



The following tables summarize polynucleotides that are differentially 
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Table 9 

Differentially expressed polynucleotides: Higher expression in high 
metastatic potential lung cancer cells (lib8) relative to low 
metastatic lung cancer cells (lib9) 



SEQ ID NO: 


Lib8 clones 


Lib9 clones 


Iib8/lib9 


14 


10 


0 


10 


137 


5 


0 


5 


151 


5 


0 


7 


152 


9 


0 


13 


171 


6 


0 


8 


200 


10 


0 


14 


254 


5 


0 


7 


262 


5 


0 


7 


271 


5 


0 


7 


348 


6 


1 


8 


412 


5 


0 


7 


507 


5 


0 


7 


520 


6 


0 


8 


530 


5 


0 


7 


588 


5 


0 


7 


623 


7 


0 


10 


637 


7 


0 


10 


660 


5 


0 


7 


678 


8 


0 


11 


680 


5 


0 


7 








6 


714 


28 


13 


3 


774 


11 


0 


15 


812 


5 


0 


7 


834 


8 


2 


6 


901 


11 


2 


8 


1168 


5 


0 


7 


1333 


6 


0 


8 


1352 


5 


0 


7 


1524 


11 


1 


15 


1706 


5 


0 


7 


1752 


17 


9 


3 


1768 


20 


4 


7 


1769 


5 


0 


7 


1780 


6 


0 


8 
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SEO ID NO* 


T ih8 clonpQ 


T iK9 rlnnp^ 

LIU7 ^IVll^O 


llUU/ 11 U7 


1781 


40 




19 


1799 


6 


1 
1 


0 
o 


1803 




1 
I 


ft 
o 


181 1 


16 


Q 


9 

z 


1884 


6 


o 


o 


1919 




1 

1 


1 1 

1 1 


1939 

1 7 J 7 


6 


o 

V/ 


C 
o 


197S 

17 / J 


43 


Q 


7 


9094 

Z. V/Z^ 


19 

1 z 


1 
1 


1 7 
i / 




o 


1 
1 


1 1 
i i 




90 

ZU 


1 3 


9 
z 


9071 

Zv / 1 


16 


A 




9198 

Z 1 zo 


_? 


o 

V 


7 


9177 
z i / / 


10 


9 

Z 


7 


9181 

Z 1 O 1 


44 


1 3 


<; 


9 1 84 

Z 1 0*T 


1 1 
1 1 


i 
i 


1 ^ 


918S 

Z 1 OJ 


1 0 






995H 

ZZO J 


7 


A 
W 


1 n 


9^1 1 

Z J 1 1 


1 0 
1 u 


t 


J 


931 4 


1 0 




1 4 


939^ 


1 A 


6 
O 


-2 
J 


9398 
z J to 


O 


1 
1 


O 
0 


9460 


1 0 
1 u 


A 


-2 
J 


9S1 A 
Z J 1 H 


Z" 

0 


A 
U 


Q 
O 


9S97 






7 


96S7 

ZOJ / 


o 


z 


z; 
O 


9669 


z. 

o 


i 
i 


Q 
o 


9670 

ZU / U 


z. 

o 


i 
i 


c 

o 


3047 


91 

Z 1 


J 


1 \J 


30S0 


16 


J 


A 


3099 


7 


1 

I 


1 0 


3140 


181 


1 19 


9 
z 


31 S7 




o 


7 


3187 


16 


j 




3210 


5 


o 


7 


3220 


28 


4 


10 


3236 


7 


1 


10 


3249 


16 


0 


22 


3264 


8 


2 


6 


3305 


7 


0 


10 


3309 


20 


0 


28 
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SEQ ID NO: 


Lib8 clones 


Lib9 clones 


Iib8/lib9 


3318 


24 


4 


8 


3330 


5 


0 


7 


3331 


5 


0 


7 



Table 10 

Differentially expressed polynucleotides: Higher expression in low metastatic lung 
cancer cells (lib 9) relative to high metastatic potential lung cancer cells (lib 8) 



SEQ ID NO: 


Lib 8 clones 


Lib 9 clones . 


lib 9/lib 8 


O A 

24 


3 


20 


5 


53 


0 


18 


13 


64 


0 


8 


6 


70 


0 


1 1 


8 


105 


10 


66 


5 


129 


0 


16 


11 


^ \ A 

214 


1 


14 


10 




4 


35 


6 


2ol 


0 


13 


9 


264 


0 


29 


^ i 
21 


329 


2 


1 *7 

17 


6 


3oo 


1 


37 


26 


J /U 


n 
u 


1 1 
1 1 


o 


418 


0 


8 


6 


450 

461 


0 

— o 


9 

9 


6 


484 


0 


26 


6 

19 


494 


0 


41 


29 


517 


1 


12 


9 


522 


1 


11 


8 


581 


1 


17 


12 


614 


3 


23 


5 


706 


0 


11 


8 


726 


5 


23 


3 


806 


0 


14 


10- 


824 


0 


9 


6 


836 


1 


14 


10 


874 


0 


12 


9 


900 


5 


21 


3 


1017 


2 


14 


5 



TP} 
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spo in no- 


T ih 8 rlrtnpc 




lih Q/liK ft 

nu y/iiu o 


1 144 


o 


ft 

o 


o 


1 1 S4 

1 I J^r 


o 


1 7 
1 z 


Q 


1 1 &f\ 


7 

z 


4S 


1 £ 
1 0 


1 1 70 


1 

1 


1 1 
1 J 


0 

y 


1 107 


7 


1 1 


<: 

j 


1 17£ 


1 
1 


1 1 
1 J 


0 


1 177 
1 JZ / 


1 
1 


1 1 


y 


1 1£7 
1 JO / 


ft 


1 7 
1 Z 


A 

y 


1 177 


A 
U 


1 7 
1 Z 


A 


1 417 


z 


1 ft 
1 o 


o 


1447 


1 
1 


1 4 


1 0 


1 4£A 
1HOO 


A 
U 


1 1 


A 

y 


1 47£ 

1 *T /D 


A 
U 


1 1 


A 

y 




A 
U 


o 
o 


0 




1 
1 


1 1 
1 J 


A 

y 


1 ££4 


JO 


7^1 


c 

J 


1 Aft7 
lOoZ 


1 
1 


1 7 


1 7 

Iz 


1 Aft7 


A 
U 


o 
y 


o 


I /JO 


A 
U 


o 
0 


o 


1 ft 1 7 
1 0 1 / 




1 0 




1 ft17 


J 


1 £ 
10 


A 

H 


1 ftA^ 


'J 


ZJ 


c 

J 


1 


Z 


1 7 
1 / 


/: 
O 


1 oi n 


1 
1 


1 ft 
1 0 


1 1 

13 


71 Afx 
Z 1 HO 


Z 


1 A 
1 0 


A 

y 


71 ^£ 
Zl 30 


A 
U 


A 

y 


o 


74A1 
ZHOj 


A 


1 9 

1 z 


A 

y 


7774 
Z / ZH 


i o 

1U 


JO 




774Q 




7000 


4 


2801 


6 


25 


3 


2993 


3 


18 


4 


3080 


0 


10 


7 


3107 


3 


23 


5 


3292 


0 


20 


14 


3324 


110 


548 


4 



4 Ho 
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EXAMPLE 7 

Polynucleotides Differentially Expressed in High Metastatic Potential 
Colon Cancer Cells Versus Low Metastatic Colon Cancer Cells 

5 A number of polynucleotide sequences have been identified that are 

differentially expressed between cells derived from high metastatic potential colon 
cancer cells and low metastatic colon cancer cells. Expression of these sequences in 
colon cancer tissue can provide diagnostic, prognostic and/or treatment information. 
For example, sequences that are highly expressed in the high metastatic potential cells 

10 can be indicative of increased expression of genes or regulatory sequences involved in 
the metastatic process. A patient sample displaying an increased level of one or more of 
these polynucleotides may thus warrant more aggressive treatment. In another example, 
sequences that display higher expression in the low metastatic potential cells can be 
associated with genes or regulatory sequences that inhibit metastasis, and thus the 

1 5 expression of these polynucleotides in a sample may warrant a more positive prognosis 
than the gross pathology would suggest. 

The differential expression of these polynucleotides can be used as a 
diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the 
like. These polynucleotide sequences can also be used in combination with other 

20 known molecular and/or biochemical markers. 

The following table summarizes identified polynucleotides with 
differential expression between high metastatic potential colon cancer cells and low 



m e tastatic potenti 




Table 1 1 

25 Differentially expressed polynucleotides: Higher expression in low metastatic colon 
cancer cells (lib 2) relative to high metastatic potential colon cancer cells (lib 1) 



SEQ ID NOs: 


Lib 1 clones 


Lib 2 clones 


lib 2/lib 1 


429 


0 


9 


10 


1494 


0 


8 


9 


1923 


34 


114 


4 


1986 


3 


12 


4 


2018 


0 


9 


10 


2036 


2 


10 


5 


2049 


8 


25 


3 


2135 


24 


87 


4 



\ 
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ocy iu in US. 


lid l ciones 


iviu z Clones 


lik 9 /Ilk 1 
11D Z/llD 1 


Z 1 HO 


9 

z 


lO 


A 

y 


9908 


o 


97 
Z / 


c 
J 


ZZ 1 J 


z 


i 1 
1 1 


c 
O 


99^0 

zzjy 


1 

1 


1 A 


i i 
1 1 


ZjU/ 


9 
z 


1 9 
1 Z 


c 
0 


Z3 1 j 


98 
Zo 


£9 
OZ 


o 

Z 


Zjj / 




14 


J 


91 AH 
ZjOU 


J 


9 1 
Zl 


O 
0 


9TA9 
ZJOZ 


A 

u 


/: 
O 


/: 
O 


ZJ /o 


-J 
J 


1 9 
1 Z 


4 




-J 
J 


9A 

zU 


o 
/ 


9 ^71 
Z J / 1 


A 
U 


O 


o 


O^QC 
ZJOO 


^/l 


1 TO 

1 /z 


J 


O^QO 
ZJVZ 


1 c 
1 J 


/I 1 
4 1 


3 


OA1 1 

zol 1 


A 
U 


o 


zr 
O 


ZOJO 


A 
U 


n 
y 


1 A 
1U 


O/^/t 1 

zo41 


O 


OA 

zU 


J 


OA^A 


A 


A 

y 


1 A 


zooz 


A 


A 

y 


1 A 
10 


ZO /4 


4 


1 J 


A 

4 


ZOOZ 


A 


c 
O 


r 
O 


Z/Uz 


y 


o c 

ZJ 


3 


OTA/1 

z /U4 


o 
0 


0*2 
ZJ 


J 


Z / 1 J 


z 


1 o 
12 


z" 
O 


ZoU4 


A 

y 


oo 

zz 


3 


OOO 1 

zozl 


1 1 


OA 

zb> 


z 


10/1 A 
Z54U 


i 


o 
5 


/\ 
y 


Zo4o 


z 


1 c 

1 J 


o 
0 


ZoOO 


A 


o 


£ 

0 


9QA< 
ZVUO 


A 


O 


z: 
0 


901 ^ 

zy l j 


AA 
44 


1 AO 

iuy 


i 
J 


zyjj 


A 


0 


c 
0 


9Q1<\ 
Z/jj 


c 

J 


10 


J 


9CK7 
Z" J / 


1 


i i 
1 1 


1 o 
1Z 


9QSQ 


J 


97 
z / 


in 


2977 


16 


30 


2 


2980 


12 


27 


2 


3000 


2 


13 


7 


3009 


12 


29 


3 


3115 


0 


7 


8 


3156 


502 


2170 


5 
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SEQ ID NOs: 


Lib 1 clones 


Lib 2 clones 


lib 2/lib 1 


3210 


2 


21 


11 


3211 


0 


9 


10 


3213 


0 


7 


8 


3235 


2 


12 


6 


3251 


2 


12 


6 


3296 


3 


12 


4 


3335 


1 


8 


9 



EXAMPLE 8 

Polynucleotides Differentially Expressed in High Metastatic Potential 
Colon Cancer Patient Tissue Versus Normal Patient Tissue 

5 

A number of polynucleotide sequences have been identified that are 
differentially expressed between cells derived from high metastatic potential colon 
cancer tissue and normal tissue. Expression of these sequences in colon cancer tissue 
can provide diagnostic, prognostic and/or treatment information. For example, 

10 sequences that are highly expressed in the high metastatic potential cells can be 
indicative of increased expression of genes or regulatory sequences involved in the 
advanced disease state which involves processes such as angiogenesis, dedifferentiation, 
cell replication, and metastasis. A patient sample displaying an increased level of one 
or more of these polynucleotides may thus warrant more aggressive treatment. 

J_5 The differential expression o f these polynucleotides can be used as a 

diagnostic marker, a prognostic marker, for risk assessment, patient treatment and the 
like. These polynucleotide sequences can also be used in combination with other 

known molecular and/or biochemical markers. 

* 

The following tables summarize polynucleotides that are differentially 
20 expressed between high metastatic potential colon cancer tissue and normal colon 
tissue: 



4 
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Table 12 

Differentially expressed polynucleotides isolated from samples from two patients 
(patient 2 and patient 3 and) : Lower expression in high metastatic potential colon tissue 
(patient 2:lib 17; patient 3:lib 20) vs. normal colon tissue (patient 2:lib 15; patient 
5 3:libl8) 



SEQ ID NO: 


lib 1 5 clones 


lib 1 7 clones 


lib 15/lib 17 


69 


19 


7 


3 


123 


6 


0 


6 


140 


24 


8 


3 


197 


6 


0 


6 


198 


113 


o 


121 


254 


28 


9 


3 


412 


28 


9 


3 


512 


11 


1 


12 

x ^m 


641 


17 


7 

# 


3 


642 


7 


0 


g 


954 


12 

X m* 


3 


4 


1011 


209 


16 


14 


1024 


8 


0 


o 


1040 


12 

A mm 


3 


4 


1055 


26 

mm 


7 


4 


1106 


31 


15 


2 


1125 


17 


0 


18 


1129 


17 


o 


18 


1138 


109 


o 


117 


1244 


14 


1 


15 


1253 


73 


0 


78 


1283 


34 


7 


5 


1285 


34 


7 


5 


1339 


13 


4 


3 


1474 


73 


0 


78 


1505 


18 


3 


6 


1553 


68 


6 


12 


1554 


2542 


14 


195 


1605 


2542 


14 


195 


1628 


6 


0 


6 


1643 


142 


4 


38 


1753 


12 


0 


10 


1764 


13 


0 


14 
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SEQ ID NO: 


lib 15 clones 


lib 17 clones 


lib 1 5/lib 1 7 


SEQ ID NO: 


Lib 18 Clones 


Lib20 Clones 


libl 8/lib20 


105 


28 


11 


2 


198 


21 


0 


18 


254 


9 


0 


8 


412 


9 


0 


8 


1011 


11 


1 


9 


1138 


14 


0 


12 


1253 


23 


0 


20 


1643 


18 


0 


15 


1764 


12 


0 


10 


3156 


140 


43 


3 



Table 13 

Differentially expressed polynucleotides isolated from samples from two patients 
(patient 2 and patient 3): Lower expression in normal colon tissue (patient 2:lib 15; 
5 patient 3:lib 1 8)vs. high metastatic potential colon tissue (patient 2:lib 17; patient 3:lib 

20). 



SEQ ID NO: 


Lib 1 5 Clones 


Lib 1 7 Clones 


lib 17/lib 15 


321 


3 


23 


7 


363 


1 


9 


8 


836 


21 


99 


4 


859 


6 


20 


3 


885 


n 


28 


2 


916 


13 


28 


2 


981 


2 


11 


5 


1226 


8 


70 


8 


1308 


0 


8 


7 


1317 


29 


84 


3 


1429 


27 


127 


4 


1442 


0 


9 


8 


1534 


1 


12 


11 


1540 


12 


43 


3 


1552 


0 


7 


7 


1556 


1 


9 


8 


1557 


1 


9 


8 


1569 


2189 


5122 


2 


1571 


6 


18 


3 


1576 


3 


25 


8 
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SEO ID NO* 


Lib 15 Clones 


t ih 1 7 Clones 


lib 1 7/lih 1 5 

ill/ 1 / / iWJ 1 J 


1581 


4 


22 


j 


1601 


25 
z. ^ 


157 


u 


1613 


9 


48 


_» 


1616 


15 


61 


4 


1620 


9 

z. 


17 


e 
o 


1622 


4 


99 


73 
z_? 


1626 

I V/ZVJ 




35 




1647 




22 

ZZ 


J 


1664 


4 


28 
zo 


7 


1683 


9 

z. 


1 8 

1 o 


8 

o 


1704 




1 5 




1800 

1 OvU 


0 


7 


7 


9740 

Z / 17 


23 

Z J 


60 


9 
z 


2784 

Z / OH 


4 


14 


j 


7805 


1 
1 


Q 


o 


2076 
Z~ / o 




14 


4 


3128 


18 
1 o 


57 

j / 




312Q 


96 
zu 


1 74 


A 


3 146 


64 


910 
Z 1 w 




^ i 


040 


99A7 
zzo / 


9 

z 


jj Jl 


9 
z 


1 s 


7 










SEQ ID NO: 


lib 18 clones 


lib 20 clones 


lib 20/lib 1 8 


865 


0 


5 


6 


1569 


1 


7 


8 


1580 


1 


7 


8 


1590 


1 


7 


8 


2790 


0 


5 


6 



EXAMPLE 9 

Polynucleotides Differentially Expressed in High Colon Tumor Potential 
Patient Tissue Versus Metastasized Colon Cancer Patient Tissue 

5 A number of polynucleotide sequences have been identified that are 

differentially expressed between cells derived from colon cancer tissue and cells derived 
from colon cancer tissue metastases to liver. Expression of these sequences in colon 
cancer tissue can provide diagnostic, prognostic and/or treatment information associated 
with the transformation of precancerous tissue to malignant tissue. This information 
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can be useful in the prevention of achieving the advanced malignant state in these 
tissues, and can be important in risk assessment for a patient. 

The following table summarizes identified polynucleotides with 
differential expression between high tumor potential colon cancer tissue and cells 
5 derived from high metastatic potential colon cancer cells: 



Table 14 

Differentially expressed polynucleotides: 
Greater expression in metastatic colon tumor tissue (lib 20) vs. 
1 0 colon tumor tissue (lib 1 9) 



SEQ ID NO: 


lib 19 clones 


lib 20 clones 


lib 20/lib 19 


937 


0 


6 


8 


976 


0 


5 


7 


1520 


1 


8 


11 


1546 


1 


11 


15 


1550 


1 


11 


15 


1574 


1 


8 


11 


1580 


0 


7 


9 


1590 


0 


7 


9 


1599 


8 


21 


4 


1607 


158 


632 


5 


1622 


1 


7 


9 



Table 1 5 

Greater expression in colon tumor tissue (lib 19) than metastatic colon tissue (lib 20) 



SEQ ID NO: 


lib 19 clones 


lib 20 clones 


lib 19/lib 20 


105 


64 


11 


4 


1011 


53 


1 


40 


1226 


18 


4 


3 


1571 


8 


0 


6 


1726 


15 


3 


4 


1811 


17 


2 


6 


2749 


47 


6 


6 


3146 


19 


2 


7 


3324 


20 


1 


15 



\ 
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EXAMPLE 10 

Polynucleotides Differentially Expressed in High Tumor Potential 
Colon Cancer Patient Tissue Versus Normal Patient Tissue 
5 A number of polynucleotide sequences have been identified that are 

differentially expressed between cells derived from high tumor potential colon cancer 
tissue and normal tissue. Expression of these sequences in colon cancer tissue can 
provide diagnostic, prognostic and/or treatment information associated with the 
prevention of the malignant state in these tissues, and can be important in risk 
10 assessment for a patient. For example, sequences that are highly expressed in the 
potential colon cancer cells are associated with or can be indicative of increased 
expression of genes or regulatory sequences involved in early tumor progression. A 
patient sample displaying an increased level of one or more of these polynucleotides 
may thus warrant closer attention or more frequent screening procedures to catch the 
1 5 malignant state as early as possible. 

The following tables summarize polynucleotides that are differentially 
expressed between high metastatic potential colon cancer cells and normal colon cells: 

Table 16 

Differentially expressed polynucleotides detected in samples from patient (patient 2) 
20 Higher expression in normal colon tissue (patient 2, lib 15) 

vs. tumor potential colon tissue (patient 2:libl6) 



SEQ ID NO: 


lib 1 5 clones 


lib 16 clones 


lib 16/lib 15 


69 


19 


7 


3 


105 


116 


54 


2 


140 


24 


4 


6 


197 


6 


0 


6 


198 


113 


3 


40 


254 


28 


6 


5 


412 


28 


6 


5 


642 


7 


0 


7 


830 


10 


2 


5 


938 


31 


13 


3 


1011 


209 


37 


6 


1095 


12 


3 


4 


1125 


17 


0 


18 
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i 



CFO in NO- 


iiD i j ciones 


no 10 clones 


lib lo/lib 15 


1 1?Q 

1 1Z7 


1 7 


A 

u 


18 


1 1 Jo 


i no 


l 

! 


lie 

115 


1 

1 Z J J 


ii 

i j 


1 


77 


I ZOj 


1A 
34 


1 1 
13 


3 


1 ZOJ 


1A 
34 


1 1 

13 


*> 
3 


1 HQ 


1 3 


3 


5 


1 4<\1 


1 1 


*> 
3 


4 




ID 


1 


77 




I O 


6 


3 


1 JJH 


2;>42 


/t /I o 

448 


6 


101O 


ZjhZ 


A AO 

448 


6 




3o 


1 A 

14 


3 


103U 


24 


9 


3 


1 A/1 'J 
1043 


142 


2 


75 




J7 


1 A 
1 4 


•> 

3 


1649 


24 


8 


3 


1677 


19 


6 


3 


1753 


13 


0 


14 


1764 


13 


0 


14 


1766 


177 


65 


3 


1772 


24 


8 


3 



Table 17 

Differentially expressed polypeptides detected in samples from patient. Lower 
expression in normal colon tissue (lib 18) than colon 



SEQ ID NO: 


lib 18 clones 


lib 19 clones 


lib 19/lib 18 


3146 


3 


19 


6 


3150 


21 


228 


10 


3324 


3 


20 


6 



WO 01/02568 PCT/USOO/18374 



Table 1 8 

Differentially expressed polypeptides detected in samples from patient. Higher 
expression in normal colon tissue (lib 18) than colon tumor tissue (lib 19) 



SEQ ID NO: 


lib 18 clones 


lib 19 clones 


lib 18/lib 19 


198 


21 


2 


12 


465 


6 


0 


7 


489 


6 


0 


7 


745 


6 


0 


7 


859 


11 


2 


6 


976 


7 


0 


8 


1011 


209 


37 


6 


1045 


8 


1 


9 


1138 


14 


0 


16 


1253 


23 


0 


26 


1392 


16 


4 


5 


1474 


23 


0 


26 


1589 


6 


0 


7 


1591 


22 


11 


2 


1607 


386 


158 


3 


1643 


18 


0 


21 


1753 


12 


0 


14 


1764 


12 


0 


14 










SEQ ID NO: 


lib 1 8 clones 


lib 19 clones 


lib 1-9/lib 18 


105 


28 


64 


2 


1011 


11 


53 


4 


1226 


2 


18 


8 


1251 


6 


19 


3 


1559 


1 


9 


8 


1571 


0 


8 


7 


1608 


1 


9 


8 


1766 


2 


13 


6 


1782 


1 


9 


8 


1811 


1 


17 


15 



5oD 
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Table 19 

Differentially expressed polynucleotides: 
Higher expression in colon tumor tissue 
(patient 2, lib 16) vs. normal colon tissue (patient 2 S lib 15) 



SEQ ID NO: 


lib 1 5 clones 


lib 16 clones 


lib 16/lib 15 


7 


1 


9 


0 


164 


6 


19 


_j 


734 


4 


15 


4 


836 


21 


53 


2 


928 


2 


1 1 




965 


2 


1 1 


<; 


987 


2 


1 1 


j 


1026 


7 


19 


3 


1044 


4 


16 


4 


1119 


4 


16 


4 


1226 


g 


46 




1227 


o 


Q 

j 


Q 


1251 


7 


95 


1 ^ 


1316 


o 




/; 

u 


1429 


27 


81 


*i 


1442 


0 


O 


Q 


1540 


12 


28 


z. 


1553 


68 


590 


8 


1560 


4 


24 


6 


1577 


1 


10 


9 


1-58S 


5 


20 


4 


1610 


3 


13 


4 


1620 


2 


23 


11 


1626 


6 


23 


4 


1673 


2 


15 


7 


2416 


0 


7 


7 


2749 


23 


54 


2 


2976 


3 


14 


4 


3129 


26 


64 


2 


3132 


18 


54 


3 
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EXAMPLE 1 1 

Polynucleotides Differentially Expressed in Growth Factor-Stimulated 
Human Microvascular Endothelial Cells (HMEC) Relative to Untreated 

HMEC 

5 A number of polynucleotide sequences have been identified that are 

differentially expressed between human microvascular endothelial cells (HMEC) that 
have been treated with growth factors relative to untreated HMEC. 

Sequences that are differentially expressed between growth factor-treated 
HMEC and untreated HMEC can represent sequences encoding gene products involved 

10 in angiogenesis, metastasis (cell migration), and other developmental and oncogenic 
processes. For example, sequences that are more highly expressed in HMEC treated 
with growth factors (such as bFGF or VEGF) relative to untreated HMEC can serve as 
markers of cancer cells of higher metastatic potential. Detection of expression of these 
sequences in colon cancer tissue can provide diagnostic, prognostic and/or treatment 

15 information associated with the prevention of achieving the malignant state in these 
tissues, and can be important in risk assessment for a patient. A patient sample 
displaying an increased level of one or more of these polynucleotides may thus warrant 
closer attention or more frequent screening procedures to catch the malignant state as 
early as possible. 

20 The following table summarizes identified polynucleotides with 

differential expression between growth factor-treated and untreated HMEC. 

Table 20 

Differentially expressed polynucleotides: 
25 Higher expression in untreated HMEC (lib 1 2) vs. bFGF treated HMEC (lib 1 3) 



SEQ ID NO: 


lib 12 clones 


lib 13 clones 


lib 12/lib 13 


849 


6 


0 


6 


1059 


6 


0 


6 


1206 


12 


2 


6 


3208 


12 


0 


12 
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Lower expression in untreated HMEC (lib 12) vs. bFGF treated HMEC (lib 13) 



2748 


3 


12 


4 


3325 


0 


6 


6 



Table 21 

Differentially expressed polynucleotides: 
Higher expression in untreated HMEC (lib 12) VEGF treated HMEC (lib 14) 



SEQ ID NO: 


lib 1 2 clones 


lib 14 clones 


lib 12/lib 14 


1150 


9 


0 


9 



5 

Lower expression in untreated HMEC (lib 12) vs. VEGF treated HMEC (libl4) 



3324 


22 


50 


2 



EXAMPLE 12 

1 0 Polynucleotides Differentially Expressed in Normal Prostate Cells 

Relative to Prostate Cancer Cells 

A number of polynucleotide sequences have been identified that are 
differentially expressed between cells derived from normal prostate cells and prostate 
cancer cells. Expression of these sequences prostate tissue suspected of being 
1~5 — cancerous can provide diagnostic, prognostic and/or treatment information. These 
polynucleotide sequences can also be used in combination with other known molecular 
and/or biochemical markers. The following table summarizes identified 
polynucleotides with differential expression between high metastatic potential colon 
cancer cells and low metastatic potential colon cancer cells: 

20 



\ 
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Table 22 

Differentially expressed polynucleotides: normal prostate cell line (lib 21) 

vs. prostate cancer cell line (lib 22) 
Higher in lib 21 



SEQ ID NO: 


lib 21 clones 


lib 22 clones 


lib 21 /lib 22 


53 


17 


2 


8 


1754 


22 


8 


3 


1801 


7 


0 


7 


1845 


22 


6 


4 


446 


8 


0 


8 


1410 


6 


0 


6 


2060 


18 


6 


3 


2143 


12 


3 


4 


2632 


13 


1 


13 


2899 


16 


2 


8 


3338 


12 


2 


6 



5 

Higher in lib 22 



86 


2 


13 


7 


93 


0 


9 


9 


687 


0 


9 


9 


1269 


1 


15 


15 


1581 


25 


74 


3 


1647 


25 


74 


3 


1649 


12 


27 


2 


1710 


5 


16 


3 


1717 


5 


16 


3 


1772 


12 


27 


2 


1960 


0 


6 


6 


2987 


0 


6 


6 


3128 


13 


42 


3 


3132 


13 


42 


3 


3150 


263 


962 


4 


3222 


0 


6 


6 


3268 


0 


6 


6 



5o«J 
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EXAMPLE 13 

Polynucleotides Differentially Expressed Across Multiple Libraries 

A number of polynucleotide sequences have been identified that are 
differentially expressed between cancerous cells and normal cells across two or more 
5 tissue types tested (/.e., breast, colon, lung, and prostate). Expression of these 
sequences in a tissue of any origin can provide diagnostic, prognostic and/or treatment 
information associated with the prevention of achieving the malignant state in these 
tissues, and can be important in risk assessment for a patient. These polynucleotides 
can also serve as non-tissue specific marked of, for example, risk of metastasis of a 

10 tumor. The following polynucleotides were differentially expressed but without tissue 
type-specificity in at least two of the breast, colon, lung, and prostate libraries tested: 
53, 105, 355, 412, 614, 836, 1442, 1581, 1647, 1649, 1664, 1772, 1782, 1811, 1845, 
1856, 1875, 1923, 2060, 2071, 2135, 2146, 2239, 2313, 2378, 2393, 2416, 2460, 2490, 
2632, 2674, 2704, 2724, 2749, 2784, 2804, 2959, 2976, 2977, 2980, 2987, 3009, 3047, 

15 3128, 3129, 3132, 3146, 3150, 3156, 3210, 3324, 3331, and 3335. 

Those skilled in the art will recognize, or be able to ascertain, using not 
more than routine experimentation, many equivalents to the specific embodiments of 
the invention described herein. Such specific embodiments and equivalents are 
intended to be encompassed by the following claims. 

20 All publications and patent applications cited in this specification are 

herein incorporated by reference as if each individual publication or patent application 
were specifically and individually indicated to be incorporated by reference. The 

citation of any publication is for its disclosure prior to the filing date and should not be 

construed as an admission that the present invention is not entitled to antedate such 

25 publication by virtue of prior invention. 

Although the foregoing invention has been described in some detail by 
way of illustration and example for purposes of clarity of understanding, it is readily 
apparent to those of ordinary skill in the art in light of the teachings of this invention 
that certain changes and modifications may be made thereto without departing from the 

30 spirit or scope of the appended claims. 

Deposit Information: 

The following materials were deposited with the American Type Culture 
Collection (ATCC); CMCC = Chiron Master Culture Collection: 

a>5 
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cDNA Libraries Deposited with ATCC 



Tube Number 


Deposit Date 


ATCC 

Accession No. 


CMCC 

Accession No. 


ES137 


May 30, 2000 






ES138 


May 30, 2000 






ESI 39 


May 30, 2000 






ESI 40 


May 30, 2000 






ES141 


May 30, 2000 






ESI 42 


May 30, 2000 






ES143 


May 30, 2000 






ESI 44 


May 30, 2000 






ESI 45 


May 30, 2000 






ESI 46 


May 30, 2000 






ESI 47 


May 30, 2000 






ES148 


May 30, 2000 






ESI 49 


May 30, 2000 






ESI 50 


May 30, 2000 






ESI 51 


May 30, 2000 






ESI 52 


May 30, 2000 






ESI 53 


May 30, 2000 






ESI 54 


May 30, 2000 






ESI 55 


May 30, 2000 






ESI 56 


May 30, 2000 






ESI 57 


May 30, 2000 






ESI 58 


May 30, 2000 




... ■ 


ES159 


May 30, 2000 






ESI 60 


May 30, 2000 






ES161 


May 30, 2000 






ESI 62 


May 30, 2000 






ESI 63 


May 30, 2000 






ESI 64 


May 30, 2000 






ES165 


May 30, 2000 






ES166 


May 30, 2000 






ES167 


May 30, 2000 







Table 23 lists the clones for each deposit, designated as "tube" number. 
This deposit is provided merely as convenience to those of skill in the art, and is not an 
admission that a deposit is required under 35 U.S.C. §112. The sequence of the 
polynucleotides contained within the deposited material, as well as the amino acid 
sequence of the polypeptides encoded thereby, are incorporated herein by reference and 
are controlling in the event of any conflict with the written description of sequences 
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herein. A license may be required to make, use, or sell the deposited material, and no 
such license is granted hereby. 

Retrieval of Individual Clones from Deposit of Pooled Clones 

Where the ATCC deposit is composed of a pool of cDNA clones, the 
5 deposit was prepared by first transfecting each of the clones into separate bacterial cells. 
The clones were then deposited as a pool of equal mixtures in the composite deposit. 
Particular clones can be obtained from the composite deposit using methods well 
known in the art. For example, a bacterial cell containing a particular clone can be 
identified by isolating single colonies, and identifying colonies containing the specific 

10 clone through standard colony hybridization techniques, using an oligonucleotide probe 
or probes designed to specifically hybridize to a sequence of the clone insert (e.g., a 
probe based upon unmasked sequence of the encoded polynucleotide having the 
indicated SEQ ID NO). The probe should be designed to have a T m of approximately 
80°C (assuming 2°C for each A or T and 4°C for each G or C). Positive colonies can 

15 then be picked, grown in culture, and the recombinant clone isolated. Alternatively, 
probes designed in this manner can be used to PCR to isolate a nucleic acid molecule 
from the pooled clones according to methods well known in the art, e.g., by purifying 
the cDNA from the deposited culture pool, and using the probes in PCR reactions to 
produce an amplified product having the corresponding desired polynucleotide 

20 sequence. 

Table 23 





m yi uc ¥^ 




RPR 


M00001351A:B02 


ES 137 


|M00003995B:E03 


ES 137 j 


M00001356A.H11 


EST37 


j M00004046C:A08 


ES 137 i 


M00001363D:D09 jES 137 


|M00004105D;D05 


ES 137 j 


M00001395D:H02 


ES 137 


|M00004139B;B10 


ES137 | 


M00001439C:H06 j 


ES 137 


|M00004140D:C03 


ES 137 j 


M00001476B:G10 


ES 137 


|M00004144A:H05 


ES 137 ! 


M00001582A:E02 


ES 137 


M00004152A;C12 


ES 137 | 


M00003750D:E06 1 


ES 137 


M00004155D:A10 


ES 137 ~1 


M0000376IC:F02 j 


ES 137 


M00004168A:GI1 


ES 137 | 


M00003770A:E05 


ES 137 


M00004197B.H10 


ES 137 


M00003786A:A11 


ES 137 


j M00004222C:E03 


ES 137 j 


M00003800A:F09 


ES137 


lM00004234A:E07 


ES 137 J 


M00003816D:E11 


ES 137 ] 


|M00004239B:F11 


ES 137 1 


M00003902A:C03 


ES 137 


!M00004241B:H07 


ES 1 37 ! 


M00003991C:F06 


ES 137 


iM00004264B:A05 

1 


ES 137 j 
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:\m : 


M00004278A:F09 


IES 

I 


137 


M00004282D:C11 


|ES 


137 


M00004308C:C06 


iES 


137 


M00004340C:C07 


jES 


137 


M00004354D:E05 


IES 


137 


M00004361A:H02 


jES 


137 


M00004372B:F07 


;ES 


137 


m66004378A:B10"" 


jES 


137 


M00004393B:E07 


|ES 


137 


M00023282A:C02 


ES 


137 


M00023300D:C11 


|ES 


137 


M00023316C:G08 


]ES 


137 


M00023333D:C12 


Fes 


137 "j 


M00023352B:F03 


jES 


137 


M0OO23352D:HO3 


j ES 


137 


M00023376B:G04 


i ES 


137 


M00023377B:F01 


"'} ES 


137 


M00023398B:D12 


jES 


137 


M00023399C:EIO 


iES 


137 


M00026803A:F08 


IES 


137 


M00026843B:D10 


Tes 


137 


|m00026850D:F69 


jES 


137 | 


M00026851B:F01 


| ES 


137 


M00026856D:F02 


jES 


137 


M00026857D:G12 


jES 


137 


M00026859DD01 


Tes ' 


137 


M00026860B:C05 


;es" 




M00026865B:A06 


jES 


137 


M00026868C:E1I 


jES 


137 


M00026878A:F05 


jES 


137 


M00026882D:G09 


ES 


137" 


M00026885A:H09 


ES 


137 


M00026901A.G07 


IES 


137 



M00026914A:H10 IES 137 



M0O026915 B :C06 j ES 137 
iM6db26918B:D6T TES 137" 



M00026922C:B02 


ES 


137 


M00026922C:G03 


ES 


137 


M00026926A:E10 [ ES 


137 


M00026927D:F02 


ES 


137 


M00026928D:A03 

■ 


ES 


137 



M00026935C:B04 
M000269 4 [57X 04 
M 000269 44B:E03 
M00026946A:F12" 



IES 137 

j 

jES 137 ~ 
j ES J37 3| 
j ES 137" j 



? Clohe Name 


■ • Tufe|S 


M00026980A:D09 


ES 137 


M00027016A:B06 


ES 137 


M00027018A:C09 


ES 137 


M00027021A:G02 


ES 137 


M00027022D:G11 


ES137 1 


M00027030C:H06 


ES137 j 


M00027035D:C06 


ES137 j 



!M00027049B:F05 



M00027078A:B02 



M00027080A:B01 



M00027085C:E11 



M00027094A:B03 



ES 137 



ES 137 



ES 137 



ES 137 



iM00027103B:A09 IES 137 



M00027108C:B03 



!M00027121D:C05 



j M00027135A:B11 
^000 271 36C:C 09 
fMO0Oi27141C:HO3 



ES 137 
iES 137 
t^l"37 

Tes T37 

+ ES 137 



M00027162B:F05 



|M00027178B:G09 
i M00027 f 795": E06 



ES 137 



M00027181D:A05 



ES 137_ 
ES 138 
ES 138 



| M0002 7198B:B08 
! M(^7200A:F02 
1 M00027207B:F07 



— 1_ 



IES 138 
IES 138 
ES 138 



s M0002721 2D:E03 
}M06627228D:A0r 
|M00027232DrB08" 
iM00027233iB7C0r 



M00027236A:E04 



jES 138 
jES 13 8' 
jES 138 
jES 138 

Fes 13? 



M00027237C:B08 I ES 138 



M0002724 8A: C02 [ES 138 
!M00027256B7h09 
M60027258A:A07 



— 



IES 138 
ES 138 



M00027263A:F10 



ES 138 



M00027292D:F10 <ES 138 



M00027297A:C04 IES 138 



M00027299B:B12 ES 138 



!M00027301A:G05 ES 138 



jM00027301B:B08 
? m60027314C:D09 



. . / 



|M00027319D:B11[ 
j N^00^324D:C05 



ESJ38_ 
ES 138 



|M00027159D:F03 iES 137 



iM00027195C:E04 jES 138 



ES 138 



ES 138 
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■ v Gloria Name ''"'W' 


» ■ * - » * « * i Bi'i «* 


jM00027347C:G07 


jES 138 


! M00027355A:B07 


ES 138 


'M00027359B:G05 


|ES 138 


M00027366A:F11 


ES 138 


M00027379C:B07 


ES 138 


M00027392B:H02 


ES 138 


M00027396D:G08 


jES 138 


h M00027398C:F07 


ES 138 


M00027438C:G07 


|ES 138 


M00027462A:D07 


ES 138 


M00027462B.H07 


ES 138 


M00027468A:C09 


ES 138 


;M00027475B:E10 


ES 138 


|M00027476A:C09 


ES 138 


|M00027486A:F06 


ES 138 


IM00027520A:C05 


ES 138 


>M00027525B:D06 


ES 138 


|M00027526D:F03 


ES 138 


jM00027528C:B10 


ES 138 


[M00027537C:B0F 


ES 138 


|M00027546C:B10 ! 


ES 138 


|M00027591B:C04 


ES 138 


!M00027596A:A10 i 


ES 138 



M00027596C:E06 IES 138 



jM 00027602B:C01 j ES 138 
rM00027615A:FKT 1 ES 138 



[M000276 17 B: C12 jES 13 8 
j M 0002762 6~D:F1 1 I ES 138 



iM0002 7 625A:H01 I ES 1 3 8 

i * 

! MM02T634A:Drr™"TESll? 



M00027641C:A03 IES 138 



j MQ0027647C:D03 _[ES]38 
jM00027652B^Fir' jES 138 
|M00027668C:H12 TeS 138 



i M00027729D:H06 

i „ 



|ES 138 



M00027733A;A02 



M00027741B:F09 



i ES J 3 l 
IES 138 



!M00027743A:C03 IES 138 



M00027801C:C11 



!M00027813C:F01 



[ES 138 
! ES 138 



iM00O27818C:C07 !ES 138 



|M00027836D:F12 :ES 138 



M00027837C:D09 jES 138 



[M0002 8120D:F 1 2 j ESJ1 3 8 
!M00028066C:b07 [ES 138 



■. ©]bhfNa&e W 

• - - l-iS^tC. 




M00028184D:G10 


ES 138 


M00028185B:A06 


ES 138 


M00028196D:A03 


ES 138 


M00028201B:H12 


ES 138 

■^rf *fcX 1 *x XX 


M00028207DE09 


ES 138 1 

* * 1 *j/ XJ» 1 


M00028210BD02 


ES 138 i 

V.X I *X XX 1 


M00028212CB08 


ES 138 ! 


M00028215DF03 


ES 138 


M00028220AB04 

* w • V XX XX *W XX xx * m. m XX ■ 


ES 138 


M00028314DF05 


ES 138 


M00028316BH12 

I ▼ 1 w XX XX XX*-/ 1 VJ Aw' till 


ES 138 i 


M00028354AB12 


ES 138 i 


M00028354DA03 

* ▼ X XX \/ \x <aM XX**/ «x >^ * ^ a X B XX 


ES 138 i 


M00028357AG10 


ES 138 


M00028362AG1 1 


ES 138 


HVI00028364GG08 


ES 138 


fM00028369DE08 

f J XX V/ XX ^ xx *y XX -X l_*x p Vx XX 


ES 138 1 


IM00028617CA12 


ES 138 1 

L-< O 1 JO f 


|M00028768C:D05 


ES 138 i 


M00028770AD04 

t i ▼ i xx v V w Vx ' r xx / m • x^ \y 


l^S 138 n 


|M00028772C:B09 


ES 138 j 


|M00028775D:F03 


ES138 | 


|M00028777B:GI2 


ES 138 '! 


|M00031368A:E10 


ES 138 | 


[M00031417C:G09 


ES 138 ! 


M00031419D:C04 
[Mb003i"485D:G02 


ES 138 1 
ES 138 "j 



M00032480B:E10 



:M00032492A:C01 
SM00032495B:D02 



!M00032499C:A01 



ES 139 



ES 139 



ES 139 



E S 139 
jES 139 

M00032510D:F12 S ES 139 



jM00032508B:H03 



IM00032510D:G06 IES 139 



]M00032513D:F01 


ES 139 i 


; M00032530D:C02 


ES 139 j 


|M00032535D:H01 


ES 139 1 


!M00032539B:C11 


ES 139 | 


;M00032540A:A09 


ES 139 | 


iM00d32541D:H08 


ES 139 | 


|M00032545B:H09 


ES 139 | 


IM00032545D:G05 
jM00032550D:C02 


ES 139 j 
ES 139 j 


|M00032551B:G05 


ES 139 J 
ES 139 ~( 


;M00032577A:C04 ; 



\ 
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M00032797B:G02 



M00032808B:G10 



r 



M00032811B:D02 



1VI 

! M00032829B:E06 



M00032^71D:E11 
"MW03287^:D06 



M00032907A:G04 



M00032909A:B06 



M00032917D:G09 



M00032918B:D08 





Mini 


M00032578A:G06 


ES 


139 


M00032584A:H08 


ES 


139 


M00032592A:H11 


ES 


139 


M00032597C:B01 


ES 

Bur 


139 


M00032638C:G08 


ES 


139 


M00032638D:A06 


ES 


139 


M00032668D:G12 


ES 


139 


jM00032678C:D06 


ES 


139 


M00032688D:D11 


ES 


139 


M000327l2B:G02 


ES 


139 


|M00032724A:C05 


FS 


139 


M00032725C:F06 


ES 


739 


|M00032726C:C01 j 


FS 


139 


|M00032731B:C10 ! 

E 


ES 


139 


|M00032731C:C07 


ES 


139 


M00032737B:E09 ! 


ES 


139 


M00032739A:A06 { 


ES 


139 


M00032744B:F10 


ES 


139 


M00032766B:D12 ! 


ES 


139 


i M0603276~6C:A04 


ES 


139 


1 M00032790B:A07 | 


ES 


139 



1~ 



M00032793A:F06 !ES 139 



!ES 139 
|ESl39 
[ESJ39 
IES 139 



M00032830D:G03 IES 139 



jM000328 31 C:G07 j ES 139 
|m00032853D:G12 ! EST39 



M00032864B:B09 ;ES 139 



IES 139 
ES 13 9 
ES 1 39 
ES 73 9 



!ES^139 
IES 139 



M00032918B:E06 IES 139 



M00032918C:B10 IES 139 



M00032942D;C12 | ES J39 
M00032944B:B02 Tes739~ 



|M00032984C:G05 f ES 139 




M00032921B:H08 


IES 


139 


jM00032933A:C10 


Ies 


139 


1M00032939B.E07 


;es 


139 


|m66032940A£1c 


IES 


139 



rGroneMame: 



|M00032990B:A11 

i — 



M00032994A:A08 



M00032995C:C05 



!M00033007C:E01 



M00033019B:E10 



h 



M00033033C:H01 



| M0Q03 3034C:A06 
iM00033034C:F02"" 



M00033037D:C11 



|M0Q033074A:C08 
|mOQ03313QB:F06 
IM00033T40D-F66" 



••sr.- 



ES 139 



ES 139 



ES 139 



ES 139 



ES 139 



ES 139 



ESJ39 
ES 139 



!M00033173D:C01 



!M00033176B:E12 



M00033186C:D11 
m60033T89D:F08 




M00033202D:G06 



1M000 33204B:A07 
|M00d33205A:FQ3 
fM00033217B:H07 



|M00033218A:C04 



!M00033223B:H07 



|M00033226A:A11 



M00033231D:B09 



ES 139 
ES 139 



ES 139 
ES 139 



ES \39 
ESJ39 

m39 

ES 139 



ES 139 
ES 739 



ES 139 



ES 139 



|M00033231ip:G10 
:M0M33243B;A05 



IES 139 
ES 739 



!M00033246C:E08 



M00033248A:B02 



jM00033261C:D12 
|M00033262D:Al7 
;M00033263B:G04 
jM00^276B:G08 
|M00033185C5bof 



ES 139 



ES 139 



ES 139 
ES 1 39 



ES 139 

Ies 73 9 

ES 139 



|M0 0033288B:D1 2 
jMM)033300D:HI2 
I M0^33306D:G08 



ES 140 



M00033306D:H09 



M00033308B:G05 



M00033343C:H08 



ES 140 
ES 140 



ES 140 



ES 140 



M00033345D:A09 



M00033346C:A05 



M00033347C:F02 



M00033349D:F05 



M00033362C:C05 



ES 140 
ES 140 



ES 140 
ES 140 



ES 140 



ES 140 



ES 140 
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, ? ^Glone Name^ 


■:W-l->".Tube 


M00033375A:G04 


;ES 140 


M00033376A:C12 


IES 140 


MO0033377D:AO5 


jES 140 


M00033410B:C09 


| ES 140 


M00033424B:A04 


|ES 140 


M00033424D:H12 


jES 140 


M00033425A:C10 


jES 140 


M00033427D:F01 


jES 140 


M00033432B:H10 


|ES 140 


|M00033437C:A07 


jES 140 


M00033437C:C03 


f ES 140 


M00033442A:D06 


jES 140 


M00033446C:G08 


I ES 140 

t 


M00033446D:B02 


i ES 140 

• 


M00033450C:A02 


jES 140 


M00033451A:H01 


IES 140 


M00033454A:D09 


IES 140 


M0003 3457D:A05 


! ES 140 


M00033560D:G07 


(ES 140 



M00033561C:A02 IES 140 

z 

f^633 566C?E 08 iESlio" 
M00033570B:C08 "lis 140 



M00033570B:E06 iES 140 



M00033570C:C10 IES 140 



M00033578D:G02 IES 140 



M00033581C:H10 I ES 140 
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CLAIMS 

4 

We claim: 

1 . A library of polynucleotides, the library comprising the sequence 
information of at least one of SEQ ID NO: I -335 1 . 

2. The library of claim 1, wherein the library is provided on a nucleic 

acid array. 

3. The library of claim 1, wherein the library is provided in a 
computer-readable format. 

4. The library of claim I, wherein the library comprises a 
polynucleotide corresponding to a gene differentially expressed in a cancer cell of high 
metastatic potential relative to a control cell, wherein the control cell is a normal cell or a 

* 

cell of low metastatic potential, wherein the expression is greater in the metastatic tissue, 
and wherein the sequence is selected from the group consisting of SEQ ID NOs:14, 137, 
151, 152, 171, 200, 254, 262, 271, 348, 412,472, 507, 520, 530, 588, 623, 637, 660, 678, 
680,700,714,774,812,834,901,937,976, 1168, 1333, 1352, 1520, 1524, 1546, 1550, 
1574, 1580, 1590, 1599, 1607, 1622, 1706, 1752, 1768, 1769, 1780, 1781, 1799, 1803, 
1811, 1851, 1856, 1867, 1872, 1875, 1884, 1919, 1923, 1939, 1975, 2024, 2045,2060, 
2071, 2118, 21 19, 2128, 2135, 2177, 2181, 2184, 2185, 2190, 2193, 2232, 2239, 2283, 



231 1, 2314, 2338, 2378, 2393, 2394, 2395, 2398, 2460, 2490, 2505, 2514, 2540, 2542, 
2597, 2607, 2640, 2657, 2669, 2670, 2674, 2679, 2684, 2707, 2724, 2757, 2776, 2804, 
2818, 2906, 2959, 2964, 2968, 2976, 2980, 2987, 3010, 3043, 3047, 3050, 3071, 3072, 
3092, 3095,3097,3140, 3157, 3173, 3187, 3203, 3210, 3212, 3220, 3236, 3249, 3264, 
3284, 3288, 3305, 3309, 33 1 8, 3330, 333 1, and 3335. 

5. The library of claim 1, wherein the library comprises a 
polynucleotide corresponding to a gene differentially expressed in normal colon tissue 
relative to colon cancer tissue, wherein the expression is greater in the cancer tissue, and 
wherein the sequence is selected from the group consisting of SEQ ID NOs:7, 1 64, 734, 
836,928,965,987, 1026, 1044, 1 1 19, 1226, 1227, 1251, 1316, 1429, 1442, 1540, 1553, 
1560, 1577, 1588, 1610, 1620, 1626, 1673, 2416, 2749, 2976, 3129 and 3132. 

5<V/ 
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6. The library of claim 1 , wherein the library comprises a 
polynucleotide corresponding to a gene differentially expressed in normal colon tissue 
relative to colon cancer tissue, wherein the expression is greater in normal tissue than 
cancer tissue, and wherein the sequence is selected from the group consisting of SEQ ID 
NOs:105, 198,465,489, 745, 859,976, 1011, 1045, 1 1 38, 1226, 1251, 1253, 1392, 1474, 
1559, 1571, 1589, 1591, 1607, 1608, 1643, 1753, 1764, 1766, 1782, 1811,2749,2784, 
2790, 2805, 2976, 3 1 28, 3 1 29, 3 146, 3 1 50, and 3 1 5 1 . 

7. The library of claim 1 , wherein the library comprises a 
polynucleotide corresponding to a gene differentially expressed in normal human 
prostate cells relative to human prostate cancer cells, wherein the expression is greater 
in normal cells than cancer cells, and wherein the sequence is selected from the group 
consisting of SEQ ID NOs:53, 446, 1410, 1754, 1801, 1845,2060,2143,2632, 2899, 
and 3338. 

8. The library of claim 1 , wherein the library comprises a 
polynucleotide corresponding to a gene differentially expressed in normal human 
prostate cells relative to human prostate cancer cells, wherein the expression is greater 
in cancer cells than normal cells, and wherein the sequence is selected from the group 
consisting of SEQ IDNOs:86,93, 687, 1269, 1581, 1647, 1649, 1710, 1717, 1772, 
I960, 2987, 3128, 3132, 3150, 3222, and 3268. 

9. An isolated polynucleotide comprising a nucleotide sequence 
having at least 90% sequence identity to an identifying sequence of SEQ ID NOs: 1-335 1 or 
a degenerate variant or fragment thereof 

10. A recombinant host cell containing the polynucleotide of claim 9. 

11. An isolated polypeptide encoded by the polynucleotide of claim 9. 

12. An antibody that specifically binds a polypeptide of claim 1 1 . 

13. A vector comprising the polynucleotide of claim 9. 

14. A method of detecting differentially expressed genes correlated 
with a cancerous state of a mammalian cell, the method comprising the step of: 

detecting at least one differentially expressed gene product in a test sample 
derived from a cell suspected of being cancerous, wherein the gene product is encoded by a 
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gene corresponding to a sequence of at least one of SEQ IDNOs: 14, 137, 151, 152, 171, 
200, 254, 262, 271, 348, 412, 472, 507, 520, 530, 588, 623, 637, 660, 678, 680, 700, 714, 
774, 812, 834,901, 937, 976, 1 168, 1333, 1352, 1520, 1524, 1546, 1550, 1574, 1580, 
1590, 1599, 1607, 1622, 1706, 1752, 1768, 1769, 1780, 1781, 1799, 1803, 1811, 1851, 
1856, 1867, 1872, 1875, 1884, 1919, 1923, 1939, 1975, 2024, 2045, 2060, 2071, 21 18, 
21 19, 2128, 2135, 2177, 2181, 2184, 2185, 2190, 2193, 2232, 2239, 2283, 2311, 2314, 
2338, 2378, 2393, 2394, 2395, 2398, 2460, 2490, 2505, 2514, 2540, 2542, 2597, 2607, 
2640, 2657, 2669, 2670, 2674, 2679, 2684, 2707, 2724, 2757, 2776, 2804, 2818, 2906, 
2959, 2964, 2968, 2976, 2980, 2987, 3010, 3043, 3047, 3050, 3071, 3072, 3092, 3095, 
3097, 3140, 3157, 3173, 3187, 3203, 3210, 3212, 3220, 3236, 3249, 3264, 3284, 3288, 
3305, 3309, 3318, 3330, 3331, and 3335. 

wherein detection of the differentially expressed gene product is correlated with 
a cancerous state of the cell from which the test sample was derived. 



with a cancerous state of a mammalian cell, the method comprising the step of: 



sample derived from a cell suspected of being cancerous, wherein the gene product is 
encoded by a gene corresponding to a sequence of at least one of SEQ ID NOs:7, 164, 
734, 836,928,965, 987, 1026, 1044, 1119, 1226, 1227, 1251, 1316, 1429, 1442, 1540, 
1553,1560, 1577, 1588, 1610, 1620, 1626, 1673, 1960,2416, 2749,2976, 2987,3128, 
3 129, 3 1 32, 3 1 50, 3222, and 3268. 

wherein detection of the differentially expressed gene product is correlated with 
a cancerous state of the cell from which the test sample was derived. 



15. A method of detecting differentially expressed genes correlated 



detecting at least one differentially expressed gene product in a test 
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aagagagatc 
<210> 1144 
ggcacgagga 
gagagagaga 
gagagagaga 
tcttttgcgc 
ttctctctct 
tctctctctc 
gaagacaccc 
tttcctcttt 
<210> 1145 
tacggctgcg 
ctgtttgtta 
catggcctaa 
ggtcca.catt 
gaaaaaaata 
atggaaatag 
tttgggagtt 
<210> 1146 
tacggctgcg 
ttaaatttag 
atattcaagt 
atctcggccc 
gcggatcact 
. gtactaaaaa 
gctgaggcat 
<210> 1147 
tcttttggcc 
aaattttgta 
ttactgagta 
ggagcgcggt 



tccctcctca 
ttgttctatt 
cctgagctca 
ccacagcatc 
<210> 1148 
tacggctgcg 
tttttttttt 
agtttggtat 
gagtctatct 
caccaccaaa 
tgttagtaat 
cagtttgtgc 
<210> 1149 
tacggctgcg 
gagtttccag 
ataccatcct 
tgttttgttt 
ctcggctcac 
gtagctgtga 
cggggtttca 
<210> 1150 
tacggctgcg 
gccctccctg 
cactcaagga 
agctagcaag 



ctaaagaaag caaaatcact 

<211> 458 
gagagagaga gagagagaga 
gagagagaga gagagagaga 
gatatatata tatctctcgc 
gatttctctc gcgccccccc 
gtctctcact ctctctcttt 
tccctctctc tctttgtttc 
tctctctccg ccccctcttt 
tttttctcgg gggcttctcc 

<211> 391 
agaagacgac agaagggaca 
atgtccaaat ctcaaccaaa 
gtaaaaaaaa aaaacttaaa 
aattttaaaa atttaaccta 
gaatatcagg taaaggatga 
aaataatccc tgtggccagg 
gaggcggcag acacttgaac 

<211> 391 
agaagacgac agaaggggga 
cagaattatc tatggattgt 
atttcattag attaaagagc 
ggtgcagtgg ctcactcctg 
tgaggtcagg agttcgagac 
aacttcactg ggcgtcttgg 
gagaatcact tgaacctgag 

<211> 456 
gaagcggcct acggctgcga 
agccattttc acaagtacaa 
ggaattttat ttttatcttt 
ggtgggatct tggcttactg 
gtctcctgag tggctgggac 
ttttctgtag agacggagtt 
ggcgatcatt tcgcctcggc 
tggcccagag tgaggagaat 

<211> 385 
agaagacgac agaagggcat 
tcacatcagt ttgatcagga 
taaatcctca ttagaacaac 
aaacatatgc attttaagcc 
gaagatgttc aaaataatag 
tctgaagttt tttaaattta 
attttcttta tacan 



<212> DNA 
gagagagaga 
gagagagagc 
gctcgcgcgc 
ttctctctct 
cttttttttt 
tcccgcgaga 
gcgccttttt 
cgtctttt 
<212> DNA 
ataccgcatt 
gagtacaata 
actgtcggaa 
aatgggaaca 
aaaatatatt 
tgeagtggct 
c 

<212> DNA 
tagcacttta 
ctagatctca 
agaggataag 
taatcccagc 
cagcctggcc 
cgcacgcctg 
a 

<212> DNA 
gaagacgaca 
agatacattt 
ttgagacggg 
tgacctctgc 
ggcaggcgcg 
ttgccatgtt 
ctcccagagt 
catgag 

<212> DNA 
tcattatcag 
aagtgtataa 
cacctgtttc 
ttcaaattac 
tcccatatct 
tggctatttt 



<213> 
gagagagaga 
gagagagaga 



gctctctctc 
ctctctctct 
ttatacactc 
tctgtgtctc 
gagatacccc 

- <213> 
ataaagattg 
catacaaaat 
aacacccatg 
caggtaccta 
agaatttatg 
catgtctgta 

<213> 
tgacaaaata 
gctgatatta 
gctgaattta 
actttgggag 
aacgtggtga 
taatcccagc 



Homo sapien 
gagagagaga 
gagagagcgc 
tctttttttc 
ccctctctct 
tctctctctc 
ttctttttgg 
cccccctctc 

Homo sapien 
tgagaggtag 
attacagtga 
aaaataaaga 
tttaaattcg 
gaggtggaaa 
gtcccagcac 

Homo sapien 
ggactgattt 
aaatataact 
aaaattttat 
gccgaggcag 
aacaccatct 
tactggggag 



<2ll> 383 
agaagacgac agaagggggg 
cgatgtcaca tctgactacc 
ggctctgttc cagatttagt 
ttgagacgga gtcttgctcg 
tacaacctcc acctcccggg 
ttacaggcac ctgccatcat 
ccacgttggc cag 

<2ll> 381 
agaagacgac agaaggggga 
gggcaagtgt cctctcacat 
ctgtcaagga taactaattc 
tcagctacct aataggtgtc 



<212> DNA 
taagggatgt 
ccggcatgga 
ttgcttgttt 
gtcacccagg 
tttaagcgat 
gctcaggtag 



<213> 
gaaggggtct 
taaccttgtc 
gtatcactgt 
ctcccgggtt 
tgccaccatg 
gcccgggctg 
gctgagattg 

<213> 
gaagttttag 
cacatcttaa 
actaataact 
attatcaaca 
gtaatcatat 
tacacgatga 

<213> 
tcacttcaga 
tatatagaat 
gatcttgagt 
ctggagtgca 
tctcccgttc 
tttttgtatt 



Homo sapien 
gttggattca 
ttctccaaaa 
cacccacact 
caaatggtcc 
cccagctaat 
gtctcagact 
gaggtgtggg 

Homo sapien 
ttatctgtca 
agcaagagtt 
tacccctgat 
tgagagaaat 
ctacatgcaa 
tgaattttga 

Homo sapien 
agtcattggg 
ctcttggatc 
atttttgttt 
gaggtgtgat 
agtctcccgg 
tttgtaacga 



<212> DNA <213> Homo sapien 

agagaggcca ctttttctgc atttctctga 
catacccatc tctacccagc agaaggctct 
aagacccatc ccacccacta ggtgccaaaa 
ttttgagaca ttcaacacac atagatttaa 
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